Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

1. Write at the beginning

The vivo cloud service provides users with the ability to back up data such as contacts, text messages, notes, and bookmarks on their mobile phones. The underlying storage uses MySQL database for data storage.

With the development of vivo cloud service business, the number of cloud service users has grown rapidly, and the amount of data stored in the cloud has become larger and larger. Massive data has brought huge challenges to back-end storage. The biggest pain point of the cloud service business in recent years is how to solve the problem of storing users' massive data.

2. Facing challenges

From 2017 to 2018, the core indicators of cloud service products focused on increasing the number of users. The cloud service has made major adjustments in the product strategy. After users log in to the vivo account, the cloud service data synchronization switch is turned on by default.

This product strategy has brought explosive growth to the number of cloud service users. The number of users jumped directly from one million to ten million, and the amount of data stored in the back-end also leapt from tens of billions to hundreds of billions.

In order to solve the storage problem of massive data, the cloud service has implemented the four methods of sub-database sub-table: horizontal sub-table, vertical sub-table, horizontal sub-database, and vertical sub-database.

1. Level score table

Path of Thorns 1: What should I do if the amount of data in a single table is over 100 million in the browser bookmarks, memo list library, and single table?

I believe that the brothers who understand the knowledge system of sub-database and sub-table will soon be able to answer: If the amount of data in a single table is too large, then the table will be divided. We did the same, split the single table of browser bookmarks and memo modules into 100 tables.

Migrate the billion-level data volume of browser bookmarks and note sheets to 100 sub-tables, each of which carries  1000W  data volume.

This is the first attack that everyone is familiar with: the level score table .

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

2. Horizontal sub-library

Path of Thorns 2: The contact and SMS data have been divided into tables, but initially only 50 tables were divided, and no database was divided. After the explosive growth of the number of users, the total amount of contact data in a single database has reached several billion, and the amount of data in a single table has reached 5000W. Continued growth will seriously affect the performance of mysql. What should I do?

The second trick is to divide the library horizontally : if one library can't support it, then divide it into several more libraries. We split the original single database into 10 databases , and expanded the original single database from 50 tables of contacts and text messages to 100 tables. During the same period, the migration and rerouting of billions of stored data was very painful.

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

3. Vertical sub-library, vertical sub-table

Path of Thorns 3: Initially, the data storage of each module of the cloud service are all mixed together.

When there is a bottleneck in the space, we analyzed the storage space distribution of each module data, and the situation is as follows:

Single library disk capacity is 5T , contact data occupies 2.75T (55%) of storage space , SMS data occupies 1T (20%) of storage space , all other module data occupies a total of 500G (5%) of storage space , and the remaining free space is 1T . Contact People and SMS data occupy 75% of the total space .

 The remaining 1T of space capacity cannot support the continuous growth of user data, and the situation is not optimistic. If there is insufficient space, all modules will be unavailable due to space problems. What should I do?

(The figure below shows the distribution of data storage space for cloud services at that time)

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

The third and fourth axes, vertical sub-database and vertical sub-table: We store and decouple contact data, SMS data and other module data. Separate contact data and SMS data into libraries.

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

So far, the cloud service has practiced all the four tricks of the sub-database and sub-tables. The data should be split and the split should be divided.

4. Dynamic expansion scheme based on routing table

Path of Thorns 4: From the above description, it is known that the split contact database adopts a fixed 10 database sub-database strategy. The preliminary evaluation of 10 databases*100 tables can meet the needs of business data growth. I thought it could sit back and relax, but The growth rate of contact data exceeded expectations.

Nine months after the contact database was split separately , the storage space of a single database increased from 35% to 65% . According to this growth rate, after another 6 months of support, the independently split contact database will once again face the problem of insufficient space.

How to solve? The continued expansion is affirmative, and the core point is which expansion strategy is adopted. If we adopt the conventional expansion plan, then we will face the problem of migration and rerouting of massive stock data, which is too costly.

After communication and discussion by the technical team, combined with the characteristics of the cloud service contact business (the number of contacts for old users is basically stable, and a large number of contacts are not added frequently, and the growth rate of contact data for old users is controllable), We finally adopted a dynamic expansion scheme based on the routing table.

The following describes the features of this program:

  • Add user routing table to record which database and table the user contact data is routed to;
  • The new user's contact data will be routed to the newly expanded database, which will not cause data storage pressure on the original old database.
  • The old user's data will not be moved, and it is still stored in the original database.
  • The feature of this solution is to ensure that the original old database only needs to ensure the data growth of old users, and all new users are carried by the newly expanded database.

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

Although the growth rate of old user contacts is controllable, we expect that the original library can reserve 60% of the storage space to support the data growth of old users. At present, only 35% of the available space in the old library is left, which does not meet our requirements.

In order to reduce the storage space occupied by old database data, we naturally thought of starting from the data compression level.

3. Pre-research on compression scheme

The cloud service has conducted pre-research on the following 3 solutions for database data compression:

Scheme 1: The program implements data compression by itself, and saves it to the database after compression

Advantage:

There is no need to make any changes to the database, the modification is completely converged by the program itself, and the fields that need to be compressed can be freely controlled.

Disadvantages:

Existing data needs to develop additional compression tasks for data compression, and the amount of the existing data is too large, and data compression by programs is time-consuming and uncontrollable.

After the data is compressed and stored in the database, the content of the select query field that needs to be directly performed from the db platform is no longer readable, which increases the difficulty of subsequent location problems.

Option 2: MySQL database InnoDB comes with data compression capabilities

Advantage:

Use InnoDB's existing capabilities for data compression, without any modification to the upper-level program, and does not affect subsequent select data queries.

Disadvantages:

It is suitable for business scenarios with a large amount of data and less reads and writes, and is not suitable for businesses that require high query performance.

Solution 3: Switch the InnoDB storage engine to TokuDB and use the natural data compression capabilities of the TokuDB engine

Advantage:

TokuDB naturally supports data compression, and supports multiple compression algorithms, supports frequent data writing scenarios, and has natural advantages for large data storage.

Disadvantages:

MySQL needs to install additional plug-ins to support the TokuDB engine, and the company currently has no business with mature TokuDB experience, the risk after access is unknown, and subsequent DBA maintenance is also a challenge.

After comprehensive consideration, we finally decided to adopt the second compression scheme: InnoDB's own compression capabilities .

The main reasons are as follows:

  • Simple operation: change the file format of the existing innodb data table by dba to compress the data;
  • Controllable compression speed: After testing, a 2000W data table can be used to compress the entire table within 1-2 days;
  • Low transformation cost: The entire transformation process only requires dba to execute related SQL, change the file format of the data table, and no need to make any changes to the upper program code;
  • More suitable for cloud service business scenarios: User data backup and recovery are not high-performance, high-QPS business scenarios, and cloud service data tables mostly conform to the characteristics of a large number of string fields, which are very suitable for data compression.

Fourth, the compression scheme verification

1. Introduction to InnoDB compression capabilities

Before MySQL version 5.1.38, there was only innodb-base storage engine. The default file format is Antelope. This file format supports two row formats (ROW_FORMAT): COMPACT and REDUNDANT, neither of which are row formats of data compression type.

After MySQL 5.1.38, the innodb-plugin was introduced, and the Barracude type file format was also introduced. Barracude is fully compatible with Antelope's file format, and supports the other two line formats DYNAMIC and COMPRESSED (supports data compression).

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

2. Compression environment preparation

Modify the database configuration: change the file format of the database, the default is Antelope, modified to Barracuda

SET GLOBAL innodb_file_format=Barracuda;

SET GLOBAL innodb_file_format_max=Barracuda;

SET GLOBAL innodb_file_per_table=1

Note: innodb_file_per_table must be set to 1. The reason is that the InnoDB system tablespace cannot be compressed. The system table space not only contains user data, but also contains InnoDB's internal system information, which can never be compressed, so different table spaces for different tables need to be set to support compression.

After setting OK, you can execute SHOW GLOBAL VARIABLES LIKE'%file_format%' and SHOW GLOBAL VARIABLES LIKE'%file_per%' to confirm whether the modification takes effect.

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

(This setting is only effective for the current session, and it will become invalid after the mysql instance is restarted. If you need to take effect permanently, please configure it in the mysql global configuration file)

3. Test verification of compression effect

Prepare a data table that supports compression format, and a data table that does not support compression. The field formats are all the same.

Compression table:

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

Description: row_format=compressed, the specified row format is compressed. Recommended key_block_size=8. The key_block_size defaults to 16. The optional values ​​16, 8, 4 represent the size of the InnoDB data page. The smaller the value, the greater the compression. Based on the comprehensive consideration of CPU and compression rate, the online recommended setting is 8.

Uncompressed table:

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

Prepare data: Use a stored procedure to insert 10W pieces of the same data into the t_nocompress table and t_compress table at the same time. The space occupied by the 2 tables is as follows:

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

The t_compress table data occupies 10M, the t_nocompress table data occupies 20M, and the compression rate is 50%.

Note: The compression effect depends on the type of table field. Typical data usually has repeated values, so it can be effectively compressed. CHAR, VARCHAR, TEXT, BLOB, etc.

String data can usually be compressed well. However, binary data (integer or floating-point numbers) and compressed data (JPEG or PNG images) usually do not achieve compression.

Five, online practice

From the above test verification, if the compression rate can reach 50%, the space occupied by the contact database will be compressed from 65% to 33%, and 60% of the remaining space can be achieved.

But we need to be in awe of the online data. Before online practice, we need to verify the offline plan. At the same time, we also need to consider the following issues:

1. Does data compression and decompression affect the performance of the db server?

We use performance stress testing to evaluate the impact on the database server CPU before and after compression. The following is the CPU comparison chart of the db server before and after compression:

Under the premise that the data volume of the contact list is already 2000W, data is inserted into this table.

Before compression: Insert 50 contacts at one time, with 200 concurrent, 10 minutes, TPS 150, CPU 33%

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

After compression: insert 50 contacts at a time, with 200 concurrent, 10 minutes, TPS 140, CPU 43%

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

After the data table is compressed, the CPU of frequent data insertion into the database will indeed increase, but the TPS is not greatly affected. After repeated stress testing, the database server CPU is basically stable at about 40%, which is acceptable for the business.

2. Will changing the format of the data table file affect the business SQL reading and writing and affect normal business functions?

We mainly did offline verification and online verification :

Offline verification: The test environment adjusted all the contact data tables to a compressed format, and arranged for a test engineer to assist in checking the full functions of the contacts, and the final functions were all normal.

The pre-launch environment follows the steps of the test environment again, and there is no abnormality in the function check.

Online verification: Select the data table of the call record module that is not sensitive to users to compress, choose to compress 1 table in 1 library, pay attention to the data read and write situation of this table, and pay attention to user complaints.

After 1 week of continuous observation, the call record data of this watch can be read and written normally, and no abnormal feedback from any user has been received during this period.

3. The amount of online contact data is huge, how to ensure the stability of the service during compression?

We mainly make trade-offs in accordance with the following ideas:

  • Select a contact data table to compress, and evaluate the time spent on a single table.
  • Select a single database, perform multi-table concurrent compression, and observe the CPU usage. DBA weighs that the maximum CPU value cannot exceed 55%, and gradually adjusts the number of compression concurrency based on this standard to ensure that the CPU is stable at about 55%, and finally obtains the maximum number of tables supported by a single database for simultaneous compression.
  • Combining the first and second steps, we can calculate the approximate time it takes to compress all the data tables of all libraries. After synchronizing to the project team and related persons in charge, follow the steps to implement the compression work.

The effect of data compression on the final online contact database is as follows:

Vivo Cloud Service Mass Data Storage Architecture Evolution and Practice

Six, write at the end

This article introduces the challenges brought by cloud services with the development of business and massive data storage, as well as some experience of cloud services in sub-database sub-table, database data compression, hoping to provide reference.

InnoDB  data compression is suitable for the following scenarios:

  • Businesses with large amount of business data and space pressure on database disks;

  • It is suitable for business scenarios that read more and write less, and is not suitable for businesses that have high requirements for performance and QPS;

  • It is suitable for a large number of string type data in the business data table structure. This type of data table can usually be effectively compressed.

At last:

  • When selecting databases and tables for business, it is necessary to fully estimate the growth of data volume. The data migration work brought about by subsequent database expansion will hurt the bones.

  • Be in awe of online data, and the solution must be applied online only after repeated offline verification.

Author: vivo platform for product development teams

Guess you like

Origin blog.51cto.com/14291117/2551183