Spring Cloud (17): High Concurrency Design

  • spike

    • Preliminary analysis of seckill business
    • The challenge of the seckill system
    • Spike system design
    • General seckill architecture
      • page access
      • Common seckill system architecture
      • The design and implementation of the seckill system in the mall
      • Isolation of spike
      • business isolation
      • system isolation
      • data isolation
  • actual deployment

    • OpenResty
    • Product acquisition
    • Inventory Acquisition
      • Lua accesses Redis slave library—Linux interprocess communication IPC (pipeline, anonymous pipe, shared memory, semaphore, message queue, UNIX shared Socket)
    • Flow Control
      • Traffic control in the early stage of seckill
        • Appointment system design
        • Reservation system optimization
          • Head e-commerce platform strategy
      • Traffic control in the event of seckill
    • Clipping
    • Limiting
      • Nginx current limiting
      • Application/service layer current limiting
      • Gateway gateway flow control
      • denial of service
    • Inventory and downgrade of limited purchases and flash sales, hotspots
      • purchase limit
      • inventory deduction
        • Database scheme ---- poor performance
        • Distributed lock scheme
      • High concurrency deduction - downgrade
        • Write service downgrade----Data inconsistency problem----Deduct inventory in Redis
        • read service downgrade
        • Simplify system functions
      • hot data
        • read hotspots
        • write hot
      • Brush prevention, risk control and disaster recovery
        • anti brush
        • Wind control
        • disaster recovery
  • Highly concurrent reading and writing practices for large websites

    • High concurrent read and write scenarios
      • Focus on "high concurrent reading" system
      • Focus on "high concurrent writing" system
      • A system with "high concurrent reading" and "high concurrent writing" at the same time
    • High Concurrency Read and Write Strategy
      • High concurrent read
        • add cache/read copy
        • concurrent read
        • rewrite light read
        • Read-write separation (CQRS architecture)
      • High concurrent write
        • data sharding
        • asynchronous
        • batch write
    • Detailed explanation of RockDB
      • RocksDB VS Redis
      • How LSM-Tree balances read and write performance
  • Question 1: Why do you need to kill the system? promotion, suck

  • Question 2: What status do leading e-commerce platforms such as JD.com and Alibaba place in the construction of the seckill system? Explosives, general merchandise

  • Question 3: What does the seckill system mean to us? Why learn the seckill system? High availability, high performance, high concurrency

spike

Preliminary analysis of seckill business

insert image description here

The challenge of the seckill system

  • huge instantaneous traffic
  • hot data problem
  • Brush traffic (packet capture tool)

Spike system design

The link path traversed by the HTTP request

insert image description here

What to do every time the user interacts with the system during a panic buying process

  1. Seckill activity data: product information that participated in the Seckill activity, mainly used for page display of the countdown, start, and end of the event and snap-up entry verification on the business details page
  2. Provide checkout page: It can be embedded across platforms (Android, PC, iOS), so it is necessary to provide a complete set of services, including H5 pages, which are mainly used to display the rush purchase information of commodities, including commodity names, prices, rush purchase quantities, addresses, and payment methods , virtual assets, etc.
  3. Provide the data required for page rendering of the settlement page: data such as addresses and virtual assets in the user dimension, data such as names and prices in the activity dimension
  4. Provide order placement: the user places an order on the settlement page, provides order generation or transparently transmits the order data to the downstream

DNS layer: Large websites will take some network-related anti-attack measures, and the network security department has some unified configuration measures

Nginx layer: reverse proxy and load balancer, high-traffic web server, static resource server, if business verification is also put here, pre-verification can be realized

Web Services: Aggregation of Businesses

RPC service: basic service

Spike general architecture

page access

insert image description here

  • In the case of high concurrency of seckill, the business details page implemented in this way will put a lot of access pressure on back-end services, especially product services and databases, even if all product information is cached, it will still consume a lot of resources. Backend Resources and Bandwidth
  • db has no cache, db is pediatrics
  • The official single-machine concurrency of redis is 10+, and the production is basically 70,000 to 80,000 concurrency after deducting the size (it is also difficult to deal with instantaneous large traffic)
  • The performance of the tomcat Java Web container is not good either. One request per thread, and one 4G memory can probably start (4000-5000) requests

Common seckill system architecture

insert image description here

  • CDN assumes static resources provided by web services or Nginx services
    • CDN is a server that is available all over the country. Clients can automatically pull static resources from the nearest CDN according to their location, which is faster
  • The responsibilities of Nginx are enlarged. It is used as a web gateway in front, responsible for part of the business logic verification, and may add black and white lists, current limiting and flow control functions. Write business in Nginx:
    • JD.com is a business gateway for business details and flash sales
    • Meituan is used as load balancing access layer
    • 12306 is used for ticket inquiry
  • Make full use of Nginx's high concurrency and high throughput capabilities
  • The characteristic of seckill business is that the ingress traffic is large. However, the traffic composition is very mixed. Among these requests
    • brush request
    • Invalid request (parameter passing and other exceptions)
    • Normal request, under normal circumstances this ratio may be 6:1:3

Distribute 30% of the truly effective requests to the downstream, and intercept the remaining 70% at the gateway layer. Otherwise, all the traffic will be sent to the web service layer, and the web service will start a new thread to process brushes and invalid requests, which is a waste of resources

The design and implementation of the seckill system in the mall

insert image description here

  1. Create an event: The operator creates an event in the operation background of the seckill system according to the specified product, and specifies the start time, end time, event inventory, etc. of the event.
  2. Start the event: Before the event starts, the seckill system operation background will start the seckill, and will simultaneously write the homepage seckill activity information to the Redis Cluster cluster of the mall system and write information such as seckill product inventory to the Redis master-slave cluster of the seckill system.
  3. Start snapping: the user enters the flash sale details page to prepare for the flash sale.
  4. Filter eligibility: You can see the button to buy immediately on the business details page. Here we can limit whether the button can be clicked by adding some logical judgments, such as whether the user level limit for buying is set, whether there is still active inventory, whether an appointment is set, etc. . If there are no restrictions, the user can click the snap-buy button to enter the seckill settlement page.
  5. Confirm order page: On the settlement page, users can change the purchase quantity, switch addresses, payment methods, etc. The settlement elements here also need to be determined according to the actual business. More complex scenarios can also support points, coupons, red envelopes, delivery timeliness, etc. , and these will affect the calculation of the final price.
  6. Payment page: After the confirmation is correct, the user submits the order. Here, the back-end service can call interfaces such as risk control and purchase limit to improve the verification. After all pass, the inventory deduction and order generation are completed.
  7. Payment callback page: After the order is completed, jump to the corresponding page according to the payment method selected by the user, for example, jump to the cash register for online payment, or jump to the order success prompt page for cash on delivery.

Isolation of spike

  • Seckill isolation strategy: general commodity and seckill commodity system isolation
  • Seckill isolation: business isolation, system isolation, data isolation,

business isolation

  • Report the event in the reporting system, providing basic information such as the product number participating in the seckill, the start and end time of the event, inventory, purchase restriction rules, risk control rules, and the geographical distribution of participating groups, estimated number of people, membership level, etc.
  • The technical department can estimate the approximate flow rate, concurrent number, etc., and combine the current capacity supported by the system to evaluate whether it needs to be expanded, whether it needs to be downgraded or adjust the current limiting strategy, etc.

system isolation

  • The user’s seckill must first enter the product details page (many e-commerce seckill systems will also wait for a countdown on the business details page, and click the seckill button to snap up when the time is up). Therefore, the first system that needs attention is the product details page. We need to apply for an independent seckill details page domain name, an independent Nginx load balancer, and an independent details page backend service.
  • To isolate the domain name, you can apply for an independent domain name, which is specially used to undertake the spike traffic. After the traffic comes in from the dedicated domain name, it is distributed to a dedicated load balancer, and then routed to a dedicated microservice group. This achieves the application Traffic isolation from ingress to microservices at the service level
  • The core system with a relatively large traffic impact in seckill is the seckill details page, seckill settlement page, and seckill order inventory deduction are the objects that we need to pay attention to. Compared with some systems at the end of the link, after the previous peak shaving, the traffic is relatively low. Controllable, such as cash registers, payment systems, physical isolation is of little significance, but will increase costs.

data isolation

  • For Redis cache, one master and one slave are enough in general scenarios, but in the spike scenario, one master and multiple slaves are needed to read hot data

actual deployment

insert image description here

insert image description here

OpenResty

Detailed use: https://blog.csdn.net/menxu_work/article/details/128400402

Product acquisition

OpenResty lua static page

Inventory Acquisition

OpenResty second Redis

insert image description here

arrive

insert image description here

Lua accesses Redis slave library—Linux interprocess communication IPC (pipeline, anonymous pipe, shared memory, semaphore, message queue, UNIX shared Socket)

Nginx needs to access Redis through the network, can this network access be avoided?

Since we will build a Redis master-slave cluster in seckill, let the Redis slave library and Nginx be deployed on the same server

When Nginx accesses Redis, at most the IP layer of the network protocol stack of the operating system can complete the data access, avoiding the actual transmission of data packets on the network.

TCP/IP层次模型:物理层,链路层、网络层、传输层、应用层

In fact, JD.com has used this design internally. There is a related introduction in "The Core Technology of Website Architecture with Hundreds of Millions of Flows by Zhang Kaitao", on pages 351 and 385

insert image description here

insert image description here

Flow Control

Traffic control in the early stage of seckill

Appointment system design

Role

  • Appointment management background, set up and close activities
  • The reservation system sends text messages or message reminders to users who have made reservations
  • Terminal-oriented reservation core microservices, providing users with the ability to make reservations and cancel reservations;

database

  • Book an event
  • user appointment

Reservation system optimization

  • time dimension
  • Number of reservations

Appointment system - has the characteristics of a certain degree of spike system, instant fusing to control the number of people

Head e-commerce platform strategy

  • Database sub-database and sub-table, mainly user reservation relationship table
  • Reservation of historical data also requires a scheduled task for carryover and archiving to reduce the pressure on the database

Reservation Event Information Form

  • It is stored in the Redis cache. If the Redis cache can't hold it, you can use Redis to carry it from one master to multiple slaves, or you can use the local cache of the service.

User reservation relationship table

  • Follow the user, there is no problem of reading hot spots, as long as the user logs in or loads the user's reservation relationship into the Redis cache at an appropriate time, read it from Redis when making a reservation for product display and then tell the user whether the reservation has been made

What happens when a user makes an appointment?

  • The message middleware is written asynchronously to prevent heavy and lost messages, and at the same time, the front end reminds users that "the appointment is in the queue"

The store details page displays the current number of reservations for users to create an atmosphere of hot products

  • Naturally, I thought of recording a record of the number of reservations in Redis. When displaying the atmosphere on the business details page, this record will be obtained from Redis for prompting, and when the user clicks the "Reserve Now" button to make an appointment, this key will be accumulated.

Generally, a single Redis chip can carry an OPS of 70,000 to 80,000. And when the appointment period receives hundreds of thousands, or even hundreds of thousands of appointments per second?

  • Accumulate in the local cache, and then write to Redis in batches. For example, after accumulating 1,000 people, incr 1,000 in Redis at one time, which reduces the writing pressure on Redis by 1,000 times

Traffic control in the event of seckill

Clipping

  1. flow clipping
  2. Verification code and quiz demoHappyCaptcha
  3. Message queue demo order

Limiting

insert image description here

1. Nginx current limiting

Current limiting module
- HttpLimitzone to limit the number of concurrent connections of a client
- HttpLimitReqest leaky bucket algorithm to limit the user's connection frequency

http { 
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s; 
    server {
        location /test-limit {
            default_type application/json;
            limit_req zone=one burst=2 nodelay;
            content_by_lua_block {
                ngx.say("Test limit")
            }
        }
    }
}

limit_req_zone is the command name, which is a keyword, and can only be used in the http block

  • $binary_remote_addr It is the built-in binding variable of Nginx, such as $remote_portthe client port number
  • zone=one:10m zone is a keyword, one is a custom rule name, which rule can be specified in the subsequent code; 10m refers to the function of declaring how much memory to support the current limit, theoretically speaking, a 1MB area can contain About 16000 session states
  • rate=1r/s rate is a keyword, which can specify the threshold of current limit. r/s means the number of requests allowed to pass per second. Here we limit 1 request per second.
  • limit_req_zone $binary_remote_addr zone=one:10m rate=5r/s; #同一ip不同请求地址,进入名为one的zone,限制速率为5请求/秒
  • limit_req_zone $binary_remote_addr $uri zone=two:10m rate=1r/s; #同一ip同一请求地址,进入名为two的zone,限制速率为1请求/秒
  • limit_req is the name of the instruction, which can be used in http, server, and location blocks. This instruction is mainly used to set the shared memory zone and the maximum burst request size
  • zone=one Use the zone named one, which was previously declared with limit_req_zone
  • burst=2 burst is used to specify the maximum number of burst requests, requests exceeding this number will be delayed
  • nodelay set nodelay, when the number of burst requests is greater than burst, this part of the request will be discarded immediately, and the client will return a 503 status under normal circumstances
    insert image description here

In the scenario of seckill, the rate and burst are generally set very low, and both can be set to 1, which means that an IP can only be accessed once within 1 second.

Users in the company, when participating in the seckill activity of the top e-commerce, it is best to switch to their own mobile phone network to avoid being "killed by mistake".

2. Application/service layer current limiting
  • thread pool limit
  • API current limit
    • The RateLimiter open source package provided by Google writes a token bucket-based current limiting annotation and implementation by itself, and uses it in the business API code.
    • Sentinel traffic governance, from multiple dimensions such as traffic routing, traffic control, traffic shaping, fuse degradation, system adaptive overload protection, hotspot traffic protection, etc.
  • Custom current limit
    • The thread-safe ConcurrentLinkedQueue pre-stores a batch of order IDs
    • Timed task refresh ( int getCount = ORDER_COUNT_LIMIT_SECOND / (1000 / FETCH_PERIOD);2,000 orders per second, 100 milliseconds refresh to get 2000/1000/100=200 once a time)
    • Obtaining a unique ID acts as a current limiter, String orderIdStr = orderIdList.poll();and 2,000 orders are evenly distributed within 1 minute
  • Hierarchical filtering
    • Nginx ends Nginx early, enable local cache lua_shared_dict stock_cache 1m; # 共享字典,也就是本地缓存,名称叫做:stock_cache,大小1musengx.shared.stock_cache get/set
    • front end:秒杀商品已无库存,秒杀结束
    • Service layer:商品已经售罄,请购买其它商品!
3. Gateway flow control
  • Solution 1: Based on redis+lua script current limiting
    gateway officially provides a RequestRateLimiter filter factory, based on redis+lua script method using token bucket algorithm to achieve current limiting
    insert image description here

  • Solution 2: Integrate sentinel flow limiting
    Use Sentinel's gateway flow control feature to implement traffic protection at the gateway entrance or limit the frequency of API calls.
    The principle of Spring Cloud Gateway accessing Sentinel to achieve current limiting:
    insert image description here

    • route dimension: the routing entry configured in the Spring configuration file, the resource name is the corresponding routeId
    • Gateway 自定义 API 维度:用户可以利用 Sentinel 提供的 API 来自定义一些 API 分组
      insert image description here

insert image description here
insert image description here

  • Scenario: Perform flow control on the seckill order interface 注意:匀速排队模式暂时不支持 QPS > 1000 的场景
  • Scenario: current limiting of hotspot parameters in product details interface热点参数限流会统计传入参数中的热点参数,并根据配置的限流阈值与模式,对包含热点参数的资源调用进行限流。注意:热点规则需要使用@SentinelResource("resourceName")注解,否则不生效; 参数必须是7种基本数据类型才会生效
    insert image description here
  • Hotspot detection hotkey
4. Denial of service

Sentinel system rule current limit

  • Load self-adaptation (only valid for Linux/Unix-like machines): the system's load1 is used as a heuristic indicator for self-adaptive system protection. When the system load1 exceeds the set heuristic value, and the current number of concurrent threads of the system exceeds the estimated system capacity, the system protection (BBR phase) will be triggered. System capacity is estimated by maxQps * minRt of the system. The setting reference value is generally CPU cores * 2.5.
  • CPU usage (version 1.5.0+): When the system CPU usage exceeds the threshold, system protection will be triggered (value range 0.0-1.0), which is more sensitive.
  • Average RT: When the average RT of all ingress traffic on a single machine reaches the threshold, system protection is triggered, and the unit is milliseconds.
  • Number of concurrent threads: When the number of concurrent threads of all ingress traffic on a single machine reaches the threshold, system protection is triggered.
  • Ingress QPS: When the QPS of all ingress traffic on a single machine reaches the threshold, system protection is triggered.
    insert image description here

Inventory and downgrade of limited purchases and flash sales, hotspots

purchase limit

  • Commodity dimension restrictions: different regions
  • Personal dimension restriction: not only refers to the same user ID, but also restricts from the same mobile phone number, the same delivery address, the same device IP and other dimensions. For example, the same mobile phone number can only place 1 order per day, each order can only purchase 1 item, and only 2 items can be purchased within a month, etc.

inventory deduction

Achieve the atomicity and orderliness of inventory deduction. How to achieve it?

1. Database solution - poor performance
  • row lock mechanism
    • Pessimistic lock: query and deduction are placed in one transaction, use for update when querying inventory, and the row lock is released at the end of the transaction
    • Through SQL statements: such as the conditions of the where statement, to ensure that the inventory will not be reduced below 0
  • optimistic lock
    • version version numberupdate set stock = stock - ? ,version = version +1 where id = ? and version = ?
  • database features
    • Directly set the field data of the database as an unsigned integer, so that when the value of the inventory field after subtraction is less than zero, the SQL statement will be directly executed to report an error.
2. Distributed lock scheme
  • Redis
  • ZooKeeper

Disadvantages:

  • Validity issue
  • NPC red lock network delay, pause, hour hand

高并发的扣减----降级

Downgrading is generally lossy, so sacrifices must be made. Several common downgrades:

  • Write service degradation: sacrifice data-consistency to obtain higher performance;
  • Degradation of read service: Emergency downgrade and quick stop loss in failure scenarios
1. Write service downgrade----data inconsistency problem----deduct inventory in Redis
  • In the scenario of multiple data sources (MySQL and Redis), data consistency issues: unless distributed transactions are introduced
  • Those with low traffic first fall into the MySQL database, and then update the data into the Redis cache by monitoring the Binlog changes of the database. Through caching, we can carry higher-traffic read operations, but write operations are still subject to the disk IOPS of the database. Generally, a database can support write operations of 3000~5000 TPS
  • The surge in traffic requires downgrading the above writing paths, from synchronous writing to the database to synchronous writing to the Redis cache and MQ asynchronous writing to the database, using Redis’s powerful OPS to carry the traffic. Generally, a single Redis shard can reach 80,000 to 100,000 OPS , the OPS of the Redis cluster is higher

Using Redis itself is single-threaded, to solve oversold, queries and deductions need to be atomic operations

PO: The Lua script reference is as follows, presented in the form of one line of comment and one line of code:

 -- -------------------Lua脚本代码开始***************************
 -- 调用Redis的get指令,查询活动库存,其中KEYS[1]为传入的参数1,即库存key
 local c_s = redis.call('get', KEYS[1])
 -- 判断活动库存是否充足,其中KEYS[2]为传入的参数2,即当前抢购数量
 if not c_s or tonumber(c_s) < tonumber(KEYS[2]) then
    return 0
 end
 -- 如果活动库存充足,则进行扣减操作。其中KEYS[2]为传入的参数2,即当前抢购数量
 redis.call('decrby',KEYS[1], KEYS[2])
 -- -------------------Lua脚本代码结束***********************

Redis 也挂了怎么处理

 * 秒杀订单下单,采用Redis缓存中直接扣减库存,
 * MQ异步保存到DB下单模式,应对高并发进行削峰,
 * 如果发送消息到MQ也成为性能瓶颈,可以引入线程池,将消息改为异步发送
 * 但存在着Redis宕机和本服务同时宕机的可能,会造成数据的丢失,
 * 需要快速持久化扣减记录,采用WAL机制实现,保存到本地RockDB数据库
2. Read service downgrade

When designing a high-availability system, keep in mind that the external middleware services or other RPC services that the microservice itself depends on may fail at any time, so we need to build a multi-level cache so that it can be downgraded and stopped in time when a failure occurs

Assuming that when the Redis cache of seckill fails, we can quickly downgrade the read request to the Redis cache, MongoDB or ES through the downgrade switch. Or when Redis and the backup cache fail at the same time (in reality, simultaneous failures rarely occur), we can still switch the traffic to the database through the downgrade switch, so that the database is temporarily under pressure to complete the read request service.

3. Simplify system functions

Simplifying system functions means eliminating unnecessary processes and abandoning non-core functions

The seckill system requires as simple as possible, the less interaction, the smaller the data, the shorter the link, the closer to the user, the faster the response, so non-core functions can be downgraded in the seckill scenario
insert image description here

hot data

Within a unit of time (1s), if a piece of data is accessed very frequently, it can be called hot data, otherwise it can be classified as general data or cold data. So how high a frequency per unit time can be called hot data? In fact, there is no clear definition, and it can be determined according to your own system throughput.

When a popular product is in the flash sale, only this SKU is the hotspot, so no matter how you divide the database and table, or increase the number of shards in the Redis cluster, the ability of the shard where the hot product SKU falls is not actually improved, and it will always touch When the upper limit is reached, Redis will be hung up, which may eventually lead to cache breakdown and system avalanche. So how should we solve this thorny hot issue?

read hotspots

  1. Increase the number of copies of hot data; increase the number of copies of Redis slaves
  2. The closer the hotspot data is to the user, the better, move the hotspot data up, and make a local cache of the hotspot data inside the service
  3. Direct short-circuit return. When a product is sold in seconds, this SKU does not support the use of coupons. Then when the coupon system is processing, it can directly return an empty coupon list according to the product SKU code.

write hot

Perform an accumulative operation on the Redis key of "reservation number". When millions of people make an appointment at the same time, this key is a hot write operation

  1. Accumulate in the JVM memory first, delay submitting to Redis, so that the OPS of Redis can be reduced dozens of times
  2. There is a way of thinking about inventory deduction, which can avoid hot issues by dismantling a hot key into multiple keys. This design is applicable to both MySQL and Redis caches, but it involves subdividing the inventory and moving sub-inventories, which is very complicated, and there are many boundary problems, which are prone to inaccurate inventory problems, so it needs to be used with caution this method
  3. The inventory of a single SKU is directly deducted on the Redis single shard. In fact, the deduction inventory is at the end of the seckill link. Through our previous peak-shaving and flow-limiting methods, the actual flow to the inventory is limited. Monolithic Redis OPS can afford it. Then, we can individually limit the inventory deduction of a single SKU to ensure the pressure of a single Redis in stock. With this two-pronged approach, the deduction pressure of Redis inventory for a single SKU is controllable

Brush prevention, risk control and disaster recovery

anti brush

  1. With the help of physical tools, such as "golden fingers" to help click the phone snap-up button
  2. Use third-party software to help trigger the snap-up button in the app on time
  3. By grabbing and analyzing the relevant interfaces of snapping up, and then simulating the snapping up process by yourself through the program

Fast feature: Only by multi-dimensional verification of risk control can it be identified, unless it skips
steps

  • Nginx has a conditional current limit, which is a very simple and direct method. This method can effectively solve the high-frequency requests of black traffic to a single interface, but to prevent the brush from directly billing without going through the pre-process, it is necessary to introduce a Token Mechanism for Process Orchestration
  • Token mechanism: Token generation and verification are performed at the Nginx layer, so that no intrusion into the main data of the business process can be achieved.
    For example, the process token can be added to the returned header through header_filter_by_lua_block. Token can do MD5, add product number, event start time, custom encryption key, etc.
  • Blacklist mechanism: local blacklist and cluster blacklist; there are two sources: one is imported from outside, which can be risk control, or other channels; and the other is self-reliance, which is self-generated and used.

Wind control

The process of continuously improving user portraits, and user portraits are the basis for establishing risk control

The basic elements of a user profile include mobile phone number, device number, identity, IP, address, etc. Some extended information also includes credit records, shopping records, performance records, work information, social security information, etc. The collection of these data cannot be achieved only by relying on a single platform, which is why the establishment of risk control requires multiple platforms, wide-ranging services, and deep coverage, because only in this way can we get as much user data as possible

The so-called risk control is actually aimed at a certain user, in different business scenarios, to check whether some data in the user profile has touched the red line, or whether certain comprehensive data has touched the red line. And with a complete user portrait, the judgment in the risk control of illegal users will naturally be more accurate.

disaster recovery

  • Double live in the same city
  • Live more in different places

Highly concurrent reading and writing practices for large websites

High concurrent read and write scenarios

Focus on "high concurrent reading" system

  • Magnitude
  • Response time
  • frequency

Scenes:

  1. search engine
  2. Product search for e-commerce
  3. Product descriptions, pictures and prices for e-commerce systems
    insert image description here

Focus on "high concurrent writing" system

The three monetization models of the Internet:

  • game
  • e-commerce
  • Advertisement (advertising fee deduction system)

Ads are usually either pay-per-view or pay-per-click (known in the industry as CPC or CPM). Specifically, advertisers open an account on the advertising platform, charge a sum of money into it, and then place their own advertisements. After C-end users see this advertisement, they may deduct one yuan per click (CPC); or browse this advertisement, and deduct 10 yuan (CPM) for 1000 views. Here is just an analogy, the actual operation is of course not at this price.

Such an advertising billing system (ie deduction system) is a typical "high concurrent writing" system

insert image description here

  1. Every time a C-end user browses or clicks, the advertiser's account balance will be deducted once.
  2. This deduction should be as real-time as possible. If the deduction is slow, there is no money in the advertiser's account, but the advertisement is still played online, which will cause the loss of platform traffic

A system with "high concurrent reading" and "high concurrent writing" at the same time

Scenes:

  • 12306 website's train ticket sales system
  • E-commerce inventory system and seckill system
  • Payment system and WeChat red envelope
  • IM, Weibo and Moments

insert image description here

High Concurrency Read and Write Strategy

High concurrent read

1. Add cache/read copy

  • Solution 1: Local cache or centralized cache
    There are four main design patterns for updating cache: Cache aside, Read through, Write through, Write behind caching:

    • Cache Aside Pattern: Update the DB first, then delete the cache, the backend is a single storage, and the storage maintains its own Cache, Binlog
    • Read through: When the cache is invalidated (expired or LRU swapped out)
    • Write Through: Occurs when updating data
    • Write Behind Caching: The Page Cache of the Linux file system is also the same algorithm.

    Cache problem: (Dachang is full cache)

    • High availability of the cache: If the cache is down, will all requests be written and the database will be overwhelmed?
    • Cache penetration: The cache is not down, but there are a lot of queries for certain keys, and these keys are not in the cache, causing a large number of requests to overwhelm the database in a short period of time.
    • Cache breakdown: A hotspot key, which is intensively accessed by large concurrency, when the key becomes invalid at the moment, continuous large concurrency will break through the cache and directly request the database.
    • A large number of hot keys expired. Similar to the second problem, it is also because some keys are invalid, and a large number of requests are written in a short period of time and overwhelm the database, that is, cache avalanche. In fact, the first problem can also be regarded as a cache avalanche.
  • Solution 2: MySQL Master/Slave

  • Solution 3: CDN/static file acceleration/dynamic and static separation

Note: Although the Slave and CDN of Redis and MySQL are completely different from a technical point of view, they are all in the form of "caching/adding copies" from a strategic point of view. It is all through data redundancy to achieve the effect of exchanging space for time

2. Concurrent read

  • Scenario 1: Asynchronous RPC "T1, T2, T3"
  • Scenario 2: Redundant Request "Hedging Request"

3. Rewrite light read

  • Solution 1: Realization of "push-pull combination" of Weibo Feeds flow
    insert image description here

  • Solution 2: Multi-table associated query: wide table and search engine

4. Summary: read-write separation (CQRS architecture)

Typical model of read-write separation architecture

  • Design separate data structures for reading and writing
  • The writing side, usually the online business DB, resists the pressure of writing through sub-databases and sub-tables. In order to resist high concurrency pressure, the reading end may be a <K, V> cache for business scenarios, or a wide table that has been joined in advance, or an ES search engine
  • Concatenation of read and write. Timed tasks regularly convert the data in the business database into a data structure suitable for high-concurrency reading; or the writing end sends data changes to the message middleware, and then the reading end consumes messages; or directly monitors the Binlog in the business database, Listen to changes in the database to update the data on the read side
  • Reads have latency compared to writes. Because the data written on the left is changing in real time, the data read on the right will definitely be delayed. The final consistency between reading and writing is not strong consistency, but this does not affect the normal operation of the business

High concurrent write

Strategy 1: Data Sharding

Strategy 2: Asynchronous

  • Case 1: SMS verification code registration or login
  • Case 2: E-commerce order system
  • Case 3: Advertising billing system
  • Case 4: Write memory + Write-Ahead log

Strategy 3: Write in batches

  • Case 1: Combined deduction of advertising billing system
  • Case 2: MySQL's small transaction merging mechanism

RockDB

In terms of transaction implementation mechanism, MySQL uses the WAL (Write-ahead logging, pre-written log) mechanism to implement, all modifications are first written into the log, and then applied to the system, usually including Redo and undo two parts of information. It is the well-known Redo log, the file name is generally ib_logfile0, 1, 2..., MySQL uses a very complicated mechanism to implement WAL, a separate log format, a wide variety of log types, group writing, etc.

RocksDB is a high-performance, persistent KV storage engine open sourced by Facebook. It was originally developed by Facebook's database engineer team based on Google LevelDB. Generally speaking, we rarely see any project that directly uses RocksDB to store data, and even in the future, it will probably not be directly used by business systems like Redis.

https://www.influxdata.com/blog/benchmarking-leveldb-vs-rocksdb-vs-hyperleveldb-vs-lmdb-performance-for-influxdb/

Write 50 million data in batches, RocksDB only took 1m26.9s
insert image description here

RocksDB VS Redis

Official read and write performance Actual read and write performance Operation method
Redis 500,000 times/s 10W/S Memory
RocksDB 200,000 times/s 4W/S disk

Why can RocksDB achieve such high write performance?
Most storage systems use storage structures such as trees or hash tables in order to achieve fast lookups. Data must be written to a specific location when it is written. For example, when we write a piece of data into the B+ tree, we must write it under a fixed node according to the sorting method of the B+ tree. The hash table is similar, and must be written to a specific hash slot.

Such a data structure will cause that when writing data, you have to write a part here on the disk, and then write a part there, so that you can jump around and write, which is what we call "random writing".

MySQL has put a lot of effort into reducing random reads and writes. The data structure of RocksDB can guarantee that most of the operations written to the disk are顺序写入

Kafka also uses sequential read and write, so the read and write performance is also very good. Everything has advantages and disadvantages. This kind of data basically cannot be queried because the data has no structure and can only be traversed.

How can RocksDB take into account good query performance under the premise of ensuring that the data is written in order?
Using the data structureLSM-Tree

How LSM-Tree balances read and write performance

The full name of LSM-Tree is The Log-Structured Merge-Tree, which is a very complex composite data structure.
it contains

  • WAL (Write Ahead Log) – Mysql
  • SkipList — Redis skip list
  • A hierarchical ordered table (Sorted String Table, SSTable)

LSM-tree is specially designed for key-value storage system to improve write performance at the expense of partial read performance.通常适合于写多读少的场景

architecture diagram

https://www.processon.com/view/link/637f182e5653bb3a8420f9af

Guess you like

Origin blog.csdn.net/menxu_work/article/details/128398724