"How to optimize the performance of tens of millions and billions of data"

Problem Scenario Introduction

As of 2021, vivo has covered more than 400 million users around the world, serving more than 60 countries and regions. The market share of vivo in the Philippines, Malaysia, India and other countries ranks among the top three, and its domestic shipments have always maintained a leading position. Rank among the Top 3 with a market share of more than 4000+ high-end mobile phones in the third quarter of 2021.

Sorry, the above is their 2021 data, but the plan we have is probably their 2018. At that time, their order was only 10 million. At that time, the amount of data in the vivo mall did not seem to be much, but it happened to be perfect learning data.

VIVO Mall Problem Scenario

Since 2017, with the rapid growth of users, the single structure of vivo official mall v1.0 has gradually exposed its disadvantages:

  • Modules are getting bloated

  • Development efficiency is low

  • performance bottleneck

  • System maintenance is difficult

The order module is the transaction core of the e-commerce system. The accumulated data is about to reach the bottleneck of single-table storage. The system is difficult to support the traffic during new product releases and promotional activities, and service-oriented transformation is imperative.

Macro introduction of optimization measures

Optimization 1: Decoupling of business architecture

The v2.0 architecture upgrade and comprehensive decoupling started in 2017, including business module decoupling and service transformation

  • The decoupling of business modules is mainly based on the vertical physical splitting of the system based on business modules.

  • Service-oriented transformation is to further micro-service based on the decoupling of business modules. The split business lines perform their own duties, provide service-oriented capabilities, and jointly support the main station business.

Based on the business module, the vertical system physical split is carried out, and the separated business lines perform their own duties, provide service-oriented capabilities, and jointly support the main station business.

Optimization 2: Optimization with a large amount of data

With the continuous accumulation of historical orders, the data volume of the order table in MySQL reached tens of millions in 2017. The subsequent order data is far greater than 100 million

For problems with a large amount of data, the following optimizations are made:

  • data archiving

  • sub-table

Optimization 3: Optimization for high throughput

The mall business is in a period of rapid development, the volume of orders has repeatedly hit new highs, and the complexity of business is also increasing.

Applications have more and more visits to MySQL, but the processing capacity of a single MySQL is limited,

When the pressure is too high, the access speed of all requests will drop, and the database may even be brought down.

Solutions with high concurrency include:

  • use cache

  • read-write separation

  • Sub-library

Optimization 4: Data consistency optimization for high-speed search engines

In order to facilitate the aggregation search of orders and high-speed search, the order data is redundantly stored in Elasticsearch.

So, how to achieve incremental consistency between MySQL order data and ES order data?

They choose from the following two options:

  • MQ scheme

  • Binlog scheme

They did not choose the Binlog solution that has little business code intrusion and does not affect the performance of the service itself

Instead, choose a lower-latency MQ solution.

Optimization 5: Reasonable selection of database migration measures

How to migrate data from the original single-instance database to the new database cluster is also a major technical challenge.

There are two issues to consider:

  • To ensure the correctness of the data,

  • It is also necessary to ensure that during the migration process, as long as there is a problem, it can be rolled back quickly.

They considered two options:

  • downtime migration

  • non-stop migration

They are more pragmatic and do not pursue grandeur.

Considering that the transformation cost of the non-stop solution is relatively high, and the business loss of the night-time shutdown solution is not large, the shutdown migration solution was finally selected.

This is the textbook choice.

Optimization 6: Reasonable selection of distributed transaction schemes

From monolithic architecture to microservice architecture, what about data consistency?

Of course, there is no guarantee for the ACID transactions of databases with a single architecture, and distributed transactions are required.

There are too many solutions for distributed transactions. For details, please refer to the following blog post:

Distributed transactions (understand in seconds)_40-year-old senior architect Nin's blog-CSDN blog_Nin distributed transactions

Among the mainstream solutions in the industry, two-phase commit (2PC) and three-phase commit (3PC) are used to solve strong consistency.

There are TCC, local messages, transaction messages, and best effort notifications to solve eventual consistency.

Starting from high concurrency scenarios, they chose the local message table solution:

In the local transaction, the asynchronous operation to be executed is recorded in the message table. If the execution fails, it can be compensated by the timed task.

Optimization 7: Some other detail optimizations

  • For example, es recall optimization

  • For example, the order optimization of messages

  • For example, sharding-jdbc paging query optimization, etc.

Specific introduction of optimization measures

  1. Business Architecture Decoupling

The decoupling of the business architecture is based on the business modules to carry out the vertical physical split of the system, and
the split business lines perform their own duties, provide service-oriented capabilities, and jointly support the main station business.
Therefore, the previous order module was separated from the mall and became an independent order system, providing standardized services such as order, payment, logistics, and after-sales for the relevant systems of the mall.
The decoupling and cooperation of the modules is the decoupling of the database. Therefore, the order module uses an independent database.
In high concurrency scenarios, after the decoupling of the modules, it is the decoupling of the services (micro-service).
After service decoupling, the corresponding team decoupling. Split out the business lines and perform their duties.

To sum up, there are actually four major decouplings:

  • Module decoupling

  • Database decoupling

  • Service decoupling

  • Team decoupling (line of business decoupling)

After the four major decouplings, the architecture of the order system is shown in the figure below:

So after the four major decoupling, what is the result:

  • The split business line performs its duties, and the iteration efficiency is greatly improved

  • It can better deal with the problems of ultra-high concurrency and ultra-large-scale data storage. Each business line can implement personalized solutions in combination with the characteristics of the field, and more effective and targeted production problems.

  1. Optimization of large amount of data

With the continuous accumulation of historical orders, the data volume of the order table in MySQL reached tens of millions in 2017. The order data after 2017 is far greater than 100 million levels.
As you know, the storage structure of the InnoDB storage engine is a B+ tree, and the search time complexity of a single table is O(log n).
The problem with the B+ tree is: the greater the height of the tree ,
the , the disk IO operations have very low performance.
Therefore, when the total amount of data n becomes larger, the retrieval speed will inevitably slow down.
No matter how you add indexes or optimize it, you can't solve it. You can only find ways to reduce the amount of data in a single table.

For problems with a large amount of data, the following optimizations are made:

  • data archiving

  • sub-table

  1. data archiving

According to the twenty-eighth law, most of the performance overhead of the **system is spent on 20% of the business. Data is no exception.
From the perspective of data usage frequency, data that is often accessed by businesses is called hot data; otherwise, it is called cold data.
Order data has a time attribute and has a hot tail effect.
After understanding the cold and hot characteristics of the data, we can guide us to do some targeted performance optimization.
There are optimizations at the business level as well as optimizations at the technical level.

Optimization at the business level :

Generally, e-commerce websites can only query orders within 3 months. If you want to view orders from 3 months ago, you need to visit the historical order page.

Optimization at the technical level:

In most cases, recent orders are retrieved, but a large amount of old data with low frequency of use is stored in the order table.

Then the old and new data can be stored separately, and the historical orders can be moved into another table.

Then, make some corresponding changes to the query module in the code, which can effectively solve the problem of large amount of data.

  1. Data table

The sub-table includes vertical sub-table and horizontal sub-table:

  • Horizontal table splitting : in the same database, split the data of one table into multiple tables according to certain rules;

  • Vertical table division : Divide a table into multiple tables according to fields, and each table stores some of the fields.

The main purpose here is to reduce the number of IOs and reduce the height of the B+ tree. Therefore, the main consideration is horizontal table partitioning.

According to the industry's reference standard, the data of a single table is 500-1000W, and the height of the B+ tree is 2-3 layers. Generally, the data records can be read after 2-3 IO operations.

However, sub-tables and measures are usually analyzed and implemented together with sub-databases.

Therefore, we will analyze it later in conjunction with the optimization of the third largest optimization with high throughput.

  1. Optimized for high throughput

As of 2021, vivo has covered more than 400 million users worldwide, serving more than 60 countries and regions

Since 2017, the mall business has been in a period of rapid development, the number of orders placed has repeatedly hit new highs, and the throughput has soared

Application throughput soars

MySQL throughput soars

However, the processing capacity of a single MySQL is limited. When the pressure is too high, the RT time of all requests will be prolonged first, the access speed will decrease, and finally the entire database will be dragged down, and the database may even be down.

The optimized solutions with high throughput are:

  • use cache

  • read-write separation

  • Sub-library

Three axes of high concurrency architecture: caching, pooling, asynchronous

use cache

The first thing to consider is the distributed cache Redis. Using Redis as the front cache of MySQL can block most query requests and reduce the response delay.

Secondly, for hot data, you can use the second level cache, or even the third level cache

However, the cache is friendly to data with local hotspots and periodic hotspots

For example: Commodity system, coupon system, event system, where there are local hotspots and periodic hotspot data systems, use first-level cache, second-level cache, or even third-level cache.

However, the order system does not belong in this scenario.

Order bear has a characteristic, each user's order data is different,

Therefore, in the order system, the cache hit rate of the cache is not high. There is no data that is too hot, so the first-level cache and the third-level cache are not used.

However, redis secondary cache can cache the latest orders,

The most recent order is also the data that the user is most likely to use recently.

Therefore, redis distribution can still share the pressure for DB. This is still needed.

read-write separation

The main library is responsible for executing data update requests, and then synchronizing data changes to all slave libraries in real time, using multiple slave libraries to share query requests.

The issue is:

However, there are many update operations for order data, and the pressure on the main library during peak order times has not been resolved.

And there is a master-slave synchronization delay. Under normal circumstances, the delay is very small, no more than 1ms, but it will also lead to inconsistent master-slave data at a certain moment.

Then all affected business scenarios need to be compatible, and some compromises may be made.

For example, after placing an order successfully, it first jumps to a successful order page, and the user can only see the order after manually clicking to view the order.

Sub-library

Sub-library includes vertical sub-library and horizontal sub-library:

Horizontal database division: split the data of the same table into different databases according to certain rules, and each database can be placed on different servers;

Vertical sub-database: Classify tables according to business and distribute them to different databases. Each database can be placed on a different server. Its core concept is dedicated to dedicated databases.

Sub-library can solve the problem of overall high throughput

Sub-tables can solve the problem of high throughput of a single table

After comprehensively considering the transformation cost, effect and impact on existing business, it was decided to use the last resort directly: sub-database and sub-table .

Sub-database and sub-table technology selection

The technical selection of sub-database and sub-table is mainly considered from the following directions:

Client sdk open source solution

Middleware proxy open source solution

Self-developed framework provided by the company's middleware team

DIY Wheels

参考之前项目经验,并与公司中间件团队沟通后,采用了开源的 Sharding-JDBC 方案。

Sharding-JDBC 方案 已更名为Sharding-Sphere。其官方的地址是:

Github:https://github.com/sharding-sphere/

文档:官方文档比较粗糙,但是网上资料、源码解析、demo比较丰富

社区:活跃

特点:jar包方式提供,属于client端分片,支持xa事务

分库分表策略

结合业务特性,选取用户标识作为分片键,

通过计算用户标识的哈希值再取模,来得到用户订单数据的库表编号。

假设共有n个库,每个库有m张表,

则库表编号的计算方式为:

库序号:Hash(userId) / m % n

表序号:Hash(userId) % m

路由过程如下图所示:

分库分表的局限性和应对方案

分库分表解决了数据量和并发问题,但它会极大限制数据库的查询能力,

有一些之前很简单的关联查询,在分库分表之后可能就没法实现了,

那就需要单独对这些Sharding-JDBC不支持的SQL进行改写。

除此之外,还遇到了这些挑战:

①全局唯一ID设计

分库分表后,数据库自增主键不再全局唯一,不能作为订单号来使用,

但很多内部系统间的交互接口只有订单号,没有用户标识这个分片键,如何用订单号来找到对应的库表呢?

原来,我们在生成订单号时,就将库表编号隐含在其中了。

这样就能在没有用户标识的场景下,从订单号中获取库表编号。

id的设计,逻辑复杂,既要考虑 高并发高性能,还要考虑时钟回拨等问题。

行业有非常多的解决案例, 推特 snowflake雪花id, 百度 雪花id,shardingjdbc 雪花id 源码,这些案例各有优势。

②历史订单号没有隐含库表信息

用一张表单独存储历史订单号和用户标识的映射关系,随着时间推移,这些订单逐渐不在系统间交互,就慢慢不再被用到。

③管理后台需要根据各种筛选条件,分页查询所有满足条件的订单

将订单数据冗余存储在搜索引擎Elasticsearch中,仅用于后台查询。

  1. 高速搜索引擎的数据一致性优化

为了便于订单的聚合搜索,高速搜索,把订单数据冗余存储在Elasticsearch中,
那么,如何在MySQL的订单数据和ES中订单数据的增量一致性呢?
上面的说法,文绉绉的。
直白来说,如何在MySQL的订单数据变更后,同步到ES中呢?
上面说到为了便于管理后台的查询,我们将订单数据冗余存储在Elasticsearch中,
那么,如何在MySQL的订单数据变更后,同步到ES中呢?
这里要考虑的是数据同步的时效性和一致性、对业务代码侵入小、不影响服务本身的性能等。

MQ方案

ES更新服务作为消费者,接收订单变更MQ消息后对ES进行更新

Binlog方案

ES更新服务借助canal等开源项目,把自己伪装成MySQL的从节点,接收Binlog并解析得到实时的数据变更信息,然后根据这个变更信息去更新ES。

其中BinLog方案比较通用,但实现起来也较为复杂,我们最终选用的是MQ方案。

因为ES数据只在管理后台使用,对数据可靠性和同步实时性的要求不是特别高。

考虑到宕机和消息丢失等极端情况,在后台增加了按某些条件手动同步ES数据的功能来进行补偿。

  1. 合理的选择数据库迁移措施

如何将数据从原来的单实例数据库,迁移到新的数据库集群,也是一大技术挑战。

不但要确保数据的正确性,还要保证每执行一个步骤后,一旦出现问题,能快速地回滚到上一个步骤。

我们考虑了停机迁移和不停机迁移的两种方案:

不停机迁移方案

  • 把旧库的数据复制到新库中,上线一个同步程序,使用 Binlog等方案实时同步旧库数据到新库;

  • 上线双写订单新旧库服务,只读写旧库;

  • 开启双写,同时停止同步程序,开启对比补偿程序,确保新库数据和旧库一致;

  • 逐步将读请求切到新库上;

  • 读写都切换到新库上,对比补偿程序确保旧库数据和新库一致;

  • 下线旧库,下线订单双写功能,下线同步程序和对比补偿程序。

停机迁移方案

  • 上线新订单系统,执行迁移程序将两个月之前的订单同步到新库,并对数据进行稽核;

  • 将商城V1应用停机,确保旧库数据不再变化;

  • 执行迁移程序,将第一步未迁移的订单同步到新库并进行稽核;

  • 上线商城V2应用,开始测试验证,如果失败则回退到商城V1应用(新订单系统有双写旧库的开关)。

考虑到不停机方案的改造成本较高,而夜间停机方案的业务损失并不大,最终选用的是停机迁移方案。

  1. 合理的进行分布式事务方案的选型

电商的交易流程中,分布式事务是一个经典问题,比如

  • 用户支付成功后,需要通知发货系统给用户发货;

  • 用户确认收货后,需要通知积分系统给用户发放购物奖励的积分。

我们是如何保证微服务架构下数据的一致性呢?

不同业务场景对数据一致性的要求不同,业界的主流方案中,用于解决强一致性的有两阶段提交(2PC)、三阶段提交(3PC),解决最终一致性的有TCC、本地消息、事务消息和最大努力通知等。

我们正在使用的本地消息表方案:

在本地事务中将要执行的异步操作记录在消息表中,如果执行失败,可以通过定时任务来补偿。

下图以订单完成后通知积分系统赠送积分为例。

  1. 其他的一些细节、具备优化

网络隔离

只有极少数第三方接口可通过外网访问,且都会验证签名,

内部系统交互使用内网域名和RPC接口,不需要要进行签名,提升性能,也提升安全性。

并发锁

分布式场景,可能会出现同一个订单的并发更新

任何订单更新操作之前,会通过数据库行级锁加以限制,防止出现并发更新。

幂等性

分布式场景,可能会出现同一个订单的重复更新

所有接口均具备幂等性,不用担心对方网络超时重试所造成的影响。

熔断

分布式场景,需要防止故障的扩散,发生由一点牵动全身的系统性雪崩

防止某个系统故障的影响扩大到整个分布式系统中。

使用Hystrix组件,对外部系统的实时调用添加熔断保护,防止某个系统故障的影响扩大到整个分布式系统中。

全方位监控和告警

通过配置日志平台的错误日志报警、调用链的服务分析告警,

再加上公司各中间件和基础组件的监控告警功能,让我们能够能够第一时间发现系统异常。

消息的有序性问题

采用MQ消费的方式同步数据库的订单相关数据到ES中,遇到的写入数据不是订单最新数据问题。

上图左边是原方案:

在消费订单数据同步的MQ时,如果线程A在先执行,查出数据,

这时候订单数据被更新了,线程B开始执行同步操作,查出订单数据后先于线程A一步写入ES中,

线程A执行写入时就会将线程B写入的数据覆盖,导致ES中的订单数据不是最新的。

上图右边是解决方案:

解决方案是在查询订单数据时加行锁,整个业务执行在事务中,执行完成后再执行下一个线程。

Guess you like

Origin blog.csdn.net/qq_44936392/article/details/129007775