Advantages of using a split database in a project

Why split the database?

Depends on database load and data volume.

At the beginning of the construction of a single project, the load and data volume of the database are not large, so there is no need to split the database. A MySQL database instance is basically enough for small financial systems, document systems, ERP systems, and OA systems. .

As mentioned in "The Ten Years of Taobao Technology", the data volume of e-commerce business is growing rapidly, so the initial PHP+MySQL architecture can no longer meet the actual requirements, so the first way Taobao thought of was to use MySQL Replaced with Oracle. But not long after, around 2008, the single-node Oracle database was not easy to use, so Taobao finally bid farewell to the single-node database and began to split the database. From one node to multiple nodes.

Splitting the database is particular. For example, there are two splitting methods: vertical splitting and horizontal splitting. Then do you split horizontally or vertically first? Does the order matter? No, the order does matter, and the order must never be wrong: first split horizontally, then split vertically .

What is vertical splitting?

Vertical splitting is to split the database according to the business (that is, there are different databases for the tables of different businesses). The data tables of the same type of business are split into an independent database, and the data tables of another type are split into other databases.

For example, for a new retail e-commerce database, we can split the product-related data tables into a database, and then build a product system based on these data tables. For example, use JAVA or PHP language to create a mall system. Then split the data tables related to follow-up, sales and inventory to another database, and then use the program to build a warehouse system.

What Problem Does Vertical Slicing Solve?

Vertical sharding can reduce the load on a single-node database. It turns out that all data tables are placed on one database node, and no doubt all read and write requests are also sent to this MySQL, so the load on the database is too high. If the database of a node is split into multiple MySQL databases, the load of each MySQL database can be effectively reduced.

What problems cannot be solved by vertical splitting

Vertical splitting can’t solve table shrinkage. For example, no matter which database node the commodity table is divided into, there are still so many records in the commodity table. No matter how detailed you split the database vertically, the amount of data in each data table is No change.

MySQL has more than 20 million records in a single table, and the read and write performance will drop rapidly. Therefore, vertical splitting cannot achieve the effect of shrinking the table.

What is horizontal splitting?

Horizontal segmentation is to divide data into multiple data tables according to certain rules of a certain field. A data table is divided into parts and split into multiple data tables, so that the effect of shrinking the table can be achieved.

Many people misunderstand horizontal segmentation, thinking that the data tables obtained by horizontal segmentation must be stored on different MySQL nodes. In fact, the horizontally split data tables can also be stored on a MySQL node. It's not that horizontal sharding necessarily requires multiple MySQL nodes. Why do you say that?

Many people don't know that MySQL comes with a data partitioning technology, which can divide and store the data of a table in different directories according to special rules. If we mount multiple hard disks to the Linux host, we can use MySQL partitioning technology to divide and store the data of a table on multiple hard disks. In this way, the original limited IO capability of one hard disk is upgraded to the enhanced IO capability of multiple disks.

Uses of split horizon

Horizontal sharding can split the data of a table into multiple data tables, which can play a role in shrinking the table.

But not all data tables need to be split horizontally. Data segmentation is only required for data tables with a large amount of data, such as user tables, commodity tables, product tables, address tables, order tables, and so on in e-commerce systems. Some data tables do not need to be split because the amount of data is not large, such as brand tables, supplier tables, and warehouse tables, which do not need to be split.

Disadvantages of split horizon

The segmentation rules of different data tables are not consistent, and should be determined according to the actual business. Therefore, when we choose database middleware products, we must choose products with rich segmentation rules. Common database middleware are: MyCat, Atlas, ProxySQL, etc. Some people think that MyCat is developed in Java language, so they doubt the efficiency of MyCat. In fact, the role of database middleware is equivalent to the router of SQL statements. The hardware configuration of your home router is not very high, but it does not affect your enjoyment of 100M broadband. The same is true for MyCat, which only plays the role of SQL statement forwarding and does not actually execute SQL statements. The main reason why I recommend using MyCat is that it comes with a lot of data segmentation rules. We can segment data according to the primary key, segment data according to the primary key range, and segment data according to date, etc. Therefore, in order to meet the needs of the business, MyCat is currently considered a very good middleware product.

Another disadvantage of horizontal sharding is that it is more troublesome to expand capacity. Over time, sharding will not be enough sooner or later. At this time, it is not the first choice to add new cluster fragments. Because a MySQL shard requires 4 to 8 MySQL nodes (minimum scale), the investment cost of adding a shard is very high. Therefore, the correct approach is to separate hot and cold data and archive the data in the shards regularly. Transfer expired business data from shards to archives. Currently, the MySQL engine with the highest data compression ratio is TokuDB, and the writing speed with transactions is 6-14 times that of the InnoDB engine. It is most suitable to use TokuDB as an archive database.

Why do horizontal splitting first and then vertical splitting?

As the amount of data increases, the first thing to do is to shard the data, and use multiple hard disks to increase the data IO capability and storage space. The cost of doing so is the lowest. You can get good IO performance for a few hard drives.

Entering the next stage, the amount of data continues to increase. At this time, we should divide the data into multiple MySQL nodes and use MyCat to manage the data division. Of course, we also need to do the separation of reading and writing of data, etc., which will not be discussed here. While performing horizontal segmentation in the background, the business system can also introduce load balancing, distributed architecture, and so on. Theoretically, after using hot and cold data separation, the method of horizontal segmentation can continue for a long time, no matter how large the data volume is, just archive it regularly.

The database has reached the stage of horizontal segmentation, and the increase in data volume is no longer the main reason for changing the architecture design. On the contrary, the business system can't bear it at this stage. If the system is not divided into modules, the business system will not be able to sustain it. Therefore, a system is divided into several subsystems according to the modules and businesses. Among several subsystems, the data is relatively independent. For example, Taobao will not share all the data with Alipay, and share the same set of data tables, which will also affect the development of their respective businesses. So it is necessary to do vertical segmentation, classify the data tables, and split them into several database systems.

Speaking of which, think carefully. If the database is vertically split prematurely, it is necessary to rebuild several independent business systems, and the workload is too huge. Horizontal segmentation does not require major modifications to the business system, so we should start with horizontal segmentation first.

Guess you like

Origin blog.csdn.net/vcit102/article/details/131800219