Sub-library sub-table a rare sub-library sub-table practice

A rare sub-library sub-table practice

 

background

Not long ago sent a two articles on the points table:

Can see from the title, we were the only sub-table; or due to business development, as of now has done a sub-library, it now appears still relatively smooth, so through my mind still remember clearly be a recovery disk.

Under the first to review the entire sub-library sub-table process is as follows:

The whole process is also well understood, in line with the majority of a company's development direction.

Rarely start a business will be designed to sub-library sub-table, although this will reduce the subsequent pit, but some companies are beginning to business oriented.

Until the business to a single table can not support, it will naturally consider the points table things even sub-library.

So Benpian will make a summary of the contents mentioned earlier may repeat.

Points table

Suitable points table under what circumstances discussed first?

According to my experience, when the amount of data a particular table have reached the millions or even billions, while increasing the amount of data more than 2%.

Of course, these numbers are not absolute, the most important thing is to write and query this table have been affected the normal business execution, such as the query rate decreased significantly, database and other high overall IO.

And when it comes to the level of sub-table or we focus on the points table;

That is, a large table of data by some routing algorithm of the data as evenly distributed to N Zhang table.

Range

The sub-table strategy there are several, respectively, for different scenarios.

The first is in accordance with the first range division, for example, we can create a particular table time saved as a division of the lunar surface by date; also be the primary key of a particular table in accordance with the scope of the division, such as [1] to 10000 in a table, [10001 to 20000] in a table, and so on.

This sub-table for the data needed to make the filing process, such as the system default only the last three months of historical data query capabilities, so also easy to operate; just before the data should be removed in March alone can save the backup).

This program is good there are drawbacks:

  • Benefits are extended horizontally own, without too much intervention.
  • The disadvantage is that the data may not be uniform happens (such as a month jumped request).

Hash

According to the scope of such date points table is certainly simple, but the scope is quite narrow; after all, most of us do not want to take time data query.

For example, a user wants to query all the information he orders generated, it is very common requirement.

So we have to change the dimension of the part table to change, the main algorithm part table may be employed  hash+mod in combination.

This is a classical algorithm, famous  HashMap the same way to store data.

Here we assume that the original order of a large table information is divided into 64 sub-table:

Here  hash is a field we need to conduct a hash sub-table operation, such as through hashing the data of uniform and will not be repeated.

Of course, if this field is a plastic itself and will not be repeated in this step may be omitted, directly  Mod to give the sub-table can subscript.

Select the number of sub-table

As for where the points table number (64) is a luxurious, this is not the specific number is set to standard values, according to their business needs, estimated data increments.

Based on my personal experience, the need to ensure at least a small list of good points after a single table data will not appear excessive (such as reach ten million) within a few years of business development.

I am more inclined to increase the acceptable range in the database as much as possible the number of points table, after all, if the subsequent small table has reached a bottleneck need to conduct a sub-table expansion, it is very painful.

Now I have not experienced this far, so this is not related presentations.

But the number of election are not blind, and  HashMap as also recommended gotta be  2^n, so when the expansion can easily migrate data little as possible.

Range + Hash

Of course, there is an idea, Range and  Hash if you can mix.

For example, we began to use a Hash points table, but the huge data growth, resulting in each sub-table data quickly reached a bottleneck, so you have to do the expansion, such as expansion from 64 to 256 tables.

But if you want non-stop migration of data is very difficult to do when the expansion, even stop, how long does it stop? Do not say.

So if we can be  Mod subdivided into lunar basis points on the table, by means of  Range its scalability, things would not have considered the subsequent data migration.

In this way theoretically possible, but I have not actually used, for everyone to be a reference to the idea of ​​it.

Annoying data migration

Sub-table rule is only when you are ready to complete the first step of the points table, the real trouble is data migration, or rather how to do data migration with minimal impact on the business.

Unless it is beginning to do the points table, so this step is certainly data migration captive.

Currently under the following finishing our practice for your reference:

  1. Once all the points on the table line data is written queries are directed to the points table, the data in the original large table had to migrate to points table, otherwise a great impact on the business.
  2. We estimate of about 200 million a migration table, write their own migration program, it takes about four to five days to complete the migration.
  3. It means that this time, the previous data is not visible to the user, so the business is clearly unacceptable.
  4. So we did a compatible treatment: sub-table on the transformation of the line, all data written to the newly created sub-table, but the operation of historical data also go cousin, so less data migration this step.
  5. Just needs to be done before the operation data once routing decision, when the new data generated enough a long time (we are two months), almost all of the operations are directed to the points table, and then migrate the data from the database to start, after the data migration is completed remove the original routing decisions.
  6. Finally, all data points generated from and writing table.

At this point the entire sub-table operation is completed.

Business compatible

After the points table while also compatible with other business needs; such as original reporting services, paging query, and now look at how we deal with it.

Report form

First reporting, query a table before between no points table to get, now is different from a table becomes N tables.

So the original query should be changed through all the points table, taking into account the performance can take advantage of multi-threaded query table data points and then aggregated.

But only rely on  Java to do statistical analysis of large amounts of data or unrealistic so, the beginning can cope with the past, the follow-up had to spend big data platform to deal with.

Inquire

Yet another query, the original query paging certainly can not be used, after all, hundreds of millions of data pages does not really matter.

Can only be provided by querying the points table fields, such as is in accordance with the order ID sub-table, that query would have to bring this field, otherwise it will involve traversing all the tables.

This is also a problem after all sub-table encounter, unless no  MySQL such relational databases.

Sub-libraries

After the sub-form can be solved single-table pressure, but the pressure did not drop the database itself.

We completed within one month after the partition table and because the database "other table" resulting in increased write the entire database IO, and not those "other form" and also business relationships.

That led to a number of non-essential data affected the overall business, this is a very worthwhile thing.

So we put it a few tables alone to a new database, and completely isolated from the existing business.

This will involve several renovation:

  1. The application itself must instead call an independent inquiry of these data, write  Dubbo service, operate on the table by the migration service.
  2. Postpone doing data migration, so they have to do so in accordance with sub-table query when a compatible, if the query will query the old data in the current library, the new data should call  Dubbo the interface to search.
  3. Some associated with the query these tables have to transform the query  Dubbo interface can be spliced in memory.
  4. If the amount of data is very large indeed, may be synchronous  Dubbo interface to the message queue is a write transducer to improve throughput.

After the huge but now we will watch this kind of business do not affect the amount of data moved to a separate database, the whole database  IO decreased significantly, business has returned to normal.

to sum up

Finally, we need to do step by historical data archiving operations, the data before N months to regularly migrate to  HBASE such storage, to ensure that  MySQLdata is kept in an acceptable range.

The query archived data they rely on big data to provide services.

The sub-library sub-table is a very rare practice operation, most of the information online is in the car before the factory for good tires.

And most of our scenes are encountered on the highway ran to the car to change the tire, believe it "car crash."

There are better ways we welcome the comments section to discuss the message.

background

Not long ago sent a two articles on the points table:

Can see from the title, we were the only sub-table; or due to business development, as of now has done a sub-library, it now appears still relatively smooth, so through my mind still remember clearly be a recovery disk.

Under the first to review the entire sub-library sub-table process is as follows:

The whole process is also well understood, in line with the majority of a company's development direction.

Rarely start a business will be designed to sub-library sub-table, although this will reduce the subsequent pit, but some companies are beginning to business oriented.

Until the business to a single table can not support, it will naturally consider the points table things even sub-library.

So Benpian will make a summary of the contents mentioned earlier may repeat.

Points table

Suitable points table under what circumstances discussed first?

According to my experience, when the amount of data a particular table have reached the millions or even billions, while increasing the amount of data more than 2%.

Of course, these numbers are not absolute, the most important thing is to write and query this table have been affected the normal business execution, such as the query rate decreased significantly, database and other high overall IO.

And when it comes to the level of sub-table or we focus on the points table;

That is, a large table of data by some routing algorithm of the data as evenly distributed to N Zhang table.

Range

The sub-table strategy there are several, respectively, for different scenarios.

The first is in accordance with the first range division, for example, we can create a particular table time saved as a division of the lunar surface by date; also be the primary key of a particular table in accordance with the scope of the division, such as [1] to 10000 in a table, [10001 to 20000] in a table, and so on.

This sub-table for the data needed to make the filing process, such as the system default only the last three months of historical data query capabilities, so also easy to operate; just before the data should be removed in March alone can save the backup).

This program is good there are drawbacks:

  • Benefits are extended horizontally own, without too much intervention.
  • The disadvantage is that the data may not be uniform happens (such as a month jumped request).

Hash

According to the scope of such date points table is certainly simple, but the scope is quite narrow; after all, most of us do not want to take time data query.

For example, a user wants to query all the information he orders generated, it is very common requirement.

So we have to change the dimension of the part table to change, the main algorithm part table may be employed  hash+mod in combination.

This is a classical algorithm, famous  HashMap the same way to store data.

Here we assume that the original order of a large table information is divided into 64 sub-table:

Here  hash is a field we need to conduct a hash sub-table operation, such as through hashing the data of uniform and will not be repeated.

Of course, if this field is a plastic itself and will not be repeated in this step may be omitted, directly  Mod to give the sub-table can subscript.

Select the number of sub-table

As for where the points table number (64) is a luxurious, this is not the specific number is set to standard values, according to their business needs, estimated data increments.

Based on my personal experience, the need to ensure at least a small list of good points after a single table data will not appear excessive (such as reach ten million) within a few years of business development.

I am more inclined to increase the acceptable range in the database as much as possible the number of points table, after all, if the subsequent small table has reached a bottleneck need to conduct a sub-table expansion, it is very painful.

Now I have not experienced this far, so this is not related presentations.

But the number of election are not blind, and  HashMap as also recommended gotta be  2^n, so when the expansion can easily migrate data little as possible.

Range + Hash

Of course, there is an idea, Range and  Hash if you can mix.

For example, we began to use a Hash points table, but the huge data growth, resulting in each sub-table data quickly reached a bottleneck, so you have to do the expansion, such as expansion from 64 to 256 tables.

But if you want non-stop migration of data is very difficult to do when the expansion, even stop, how long does it stop? Do not say.

So if we can be  Mod subdivided into lunar basis points on the table, by means of  Range its scalability, things would not have considered the subsequent data migration.

In this way theoretically possible, but I have not actually used, for everyone to be a reference to the idea of ​​it.

Annoying data migration

Sub-table rule is only when you are ready to complete the first step of the points table, the real trouble is data migration, or rather how to do data migration with minimal impact on the business.

Unless it is beginning to do the points table, so this step is certainly data migration captive.

Currently under the following finishing our practice for your reference:

  1. Once all the points on the table line data is written queries are directed to the points table, the data in the original large table had to migrate to points table, otherwise a great impact on the business.
  2. We estimate of about 200 million a migration table, write their own migration program, it takes about four to five days to complete the migration.
  3. It means that this time, the previous data is not visible to the user, so the business is clearly unacceptable.
  4. So we did a compatible treatment: sub-table on the transformation of the line, all data written to the newly created sub-table, but the operation of historical data also go cousin, so less data migration this step.
  5. Just needs to be done before the operation data once routing decision, when the new data generated enough a long time (we are two months), almost all of the operations are directed to the points table, and then migrate the data from the database to start, after the data migration is completed remove the original routing decisions.
  6. Finally, all data points generated from and writing table.

At this point the entire sub-table operation is completed.

Business compatible

After the points table while also compatible with other business needs; such as original reporting services, paging query, and now look at how we deal with it.

Report form

First reporting, query a table before between no points table to get, now is different from a table becomes N tables.

So the original query should be changed through all the points table, taking into account the performance can take advantage of multi-threaded query table data points and then aggregated.

But only rely on  Java to do statistical analysis of large amounts of data or unrealistic so, the beginning can cope with the past, the follow-up had to spend big data platform to deal with.

Inquire

Yet another query, the original query paging certainly can not be used, after all, hundreds of millions of data pages does not really matter.

Can only be provided by querying the points table fields, such as is in accordance with the order ID sub-table, that query would have to bring this field, otherwise it will involve traversing all the tables.

This is also a problem after all sub-table encounter, unless no  MySQL such relational databases.

Sub-libraries

After the sub-form can be solved single-table pressure, but the pressure did not drop the database itself.

We completed within one month after the partition table and because the database "other table" resulting in increased write the entire database IO, and not those "other form" and also business relationships.

That led to a number of non-essential data affected the overall business, this is a very worthwhile thing.

So we put it a few tables alone to a new database, and completely isolated from the existing business.

This will involve several renovation:

  1. The application itself must instead call an independent inquiry of these data, write  Dubbo service, operate on the table by the migration service.
  2. Postpone doing data migration, so they have to do so in accordance with sub-table query when a compatible, if the query will query the old data in the current library, the new data should call  Dubbo the interface to search.
  3. Some associated with the query these tables have to transform the query  Dubbo interface can be spliced in memory.
  4. If the amount of data is very large indeed, may be synchronous  Dubbo interface to the message queue is a write transducer to improve throughput.

After the huge but now we will watch this kind of business do not affect the amount of data moved to a separate database, the whole database  IO decreased significantly, business has returned to normal.

to sum up

Finally, we need to do step by historical data archiving operations, the data before N months to regularly migrate to  HBASE such storage, to ensure that  MySQLdata is kept in an acceptable range.

The query archived data they rely on big data to provide services.

The sub-library sub-table is a very rare practice operation, most of the information online is in the car before the factory for good tires.

And most of our scenes are encountered on the highway ran to the car to change the tire, believe it "car crash."

There are better ways we welcome the comments section to discuss the message.

Guess you like

Origin www.cnblogs.com/Leo_wl/p/11669050.html