Database schema (transfer)

1. Industry problems - four solutions for "cross-database paging"

1). Method 1: Global Vision Method

a.将order by time offset X limit Y，改写成order by time offset 0 limit X+Y

b. The service layer sorts the obtained N*(X+Y) pieces of data in memory, and then takes the Y records after the offset X after sorting the memory.

The performance of this method is getting lower and lower as the page turning progresses.

2). Method 2: Business compromise method - prohibit page jump query

a. Use the normal method to get the first page of data, and get the time_max of the first page record

b. Rewrite order by time offset X limit Y to order by time where time>$time_max limit Y every time you turn the page

To ensure that only one page of data is returned at a time, the performance is constant.

3). Method 3: Business compromise method - allow fuzzy data (multiple databases find out data on average, and lose a certain accuracy)

a. 将order by time offset X limit Y，改写成order by time offset X/N limit Y/N

4). Method 4: Secondary query method

a. 将order by time offset X limit Y，改写成order by time offset X/N limit Y

b. Find the minimum time_min

c. between二次查询，order by time between $time_min and $time_i_max

d. Set the virtual time_min, find the offset of time_min in each sub-library, so as to get the global offset of time_min

e. Get the global offset of time_min, and naturally get the global offset X limit Y

2. Single KEY business, database horizontal segmentation architecture practice

1). Horizontal segmentation method

a. Scope method, using the business primary key uid of the user center as the division basis, divides the data horizontally into two database instances:

user-db1: stores uid data from 0 to 10 million

user-db2: stores 1 to 20 million uid data

a). The advantages of the range method are: the segmentation strategy is simple, according to the uid, according to the scope, the user-center can quickly locate which database the data is in; the expansion is simple, if the capacity is not enough, just add user-db3

b). The shortcomings of the scope method are:

uid must satisfy the incremental characteristic

The amount of data is uneven, and the newly added user-db3 will have less data in the initial stage

The amount of requests is uneven. Generally speaking, the activity of newly registered users is relatively high, so user-db2 tends to have a higher load than user-db1, resulting in unbalanced server utilization.

b. Hash method, which is also based on the business primary key uid of the user center, and divides the data horizontally into two database instances:

user-db1: store uid data whose uid is modulo 1

user-db2: Stores uid data whose uid is modulo 0

a). The advantage of the hash method is that the segmentation strategy is simple. According to the uid and hash, the user-center can quickly locate which library the data is on.

The amount of data is balanced. As long as the uid is uniform, the distribution of data on each database must be balanced.

The request volume is balanced, as long as the uid is uniform, the distribution of the load on each library must be balanced

b). The shortcomings of the hash method are: it is troublesome to expand the capacity. If the capacity is not enough, it is necessary to add a library and re-hash may lead to data migration. How to smoothly perform data migration is a problem that needs to be solved.

2). Problems caused by the horizontal segmentation of the user center:

For the query on the uid attribute, it can be directly routed to the library. Assuming that the data of uid=124 is accessed, the db-user1 can be directly located after the modulo: for the query on the non-uid attribute, such as the query on the login_name attribute, it is tragic:

Assuming that the data of login_name=shenjian is accessed, since it is not known which database the data falls on, it is often necessary to traverse all the databases. When the number of sub-databases increases, the performance will be significantly reduced.

3). According to the building experience of the landlord over the years, there are often two types of business requirements on the non-uid attribute of the user center:

a. User side, foreground access, the most typical two types of requirements

User login: query the user's entity through login_name/phone/email, 1% of requests belong to this type

User information query: After logging in, query the user's instance by uid, 99% of the requests belong to this type

The query on the user side is basically a query of a single record, the traffic volume is large, the service needs to be highly available, and the requirements for consistency are high.

b. Operation side, background access, according to product and operational needs, different access modes, query according to age, gender, avatar, login time, and registration time.

The query on the operation side is basically a batch paging query. Since it is an internal system, the traffic volume is very low, the requirements for availability are not high, and the requirements for consistency are not so strict.

What kind of architecture solution should be used to solve these two different business requirements?

4). User-centered horizontal segmentation architecture idea

In the case of a large amount of data, the user center uses uid for horizontal segmentation. For the query requirements on non-uid attributes, the core idea of the architecture design is as follows:

For the user side, the architectural scheme of "establishing the mapping relationship between non-uid attributes and uid" should be adopted

For the operation side, the architecture scheme of "separation of the foreground and the background" should be adopted

5). User Center - Best Practices on the User Side

a. Index table method

Idea: uid can be directly located to the library, login_name cannot be directly located to the library, if the uid can be queried through login_name, the problem is solved

solution:

Create an index table to record the mapping relationship of login_name->uid

When accessing with login_name, first query the uid through the index table, and then locate the corresponding library

The index table has few attributes and can hold a lot of data, and generally does not require sub-database

If the amount of data is too large, you can divide the database by login_name

Potential deficiencies: one more database query, doubling the performance

b. Cache mapping method:

Idea: The performance of accessing the index table is low, and it is better to put the mapping relationship in the cache

solution:

The login_name query first queries the uid in the cache, and then locates the database according to the uid

Assuming a cache miss, use the full library method to obtain the uid corresponding to login_name and put it in the cache

The mapping relationship between login_name and uid will not change. Once the mapping relationship is put into the cache, it will not change, no need to be eliminated, and the cache hit rate is very high.

If the amount of data is too large, the cache can be split horizontally through login_name

Potential deficiency: one more cache query

c.login_name generates uid:

Idea: Do not perform remote query, get uid directly from login_name

solution:

When the user registers, the design function login_name generates uid, uid=f(login_name), and inserts data according to uid sub-database

When using login_name to access, first calculate the uid through the function, that is, uid=f(login_name) and then do it again, and route the uid to the corresponding library

Potential deficiencies: The design of this function needs to be very skillful, and there is a risk of uid generation conflict

d.login_name gene into uid

Idea: You can't use login_name to generate uid, you can extract "gene" from login_name and integrate it into uid. Assuming that it is divided into 8 libraries and uses uid%8 routing, the subtext is that the last 3 bits of uid determine which library the data falls on. These 3 bits are the so-called "genes".

Solution: When the user registers, the design function login_name generates a 3-bit gene, login_name_gene=f(login_name), as shown in the pink part of the figure above

At the same time, a 61bit global unique id is generated as the user's identification, as shown in the green part of the figure above

Then use the 3bit login_name_gene as part of the uid, as shown in the yellow part of the shit above

Generate 64bit uid, assembled from id and login_name_gene, and insert data according to uid sub-database

When accessing with login_name, first restore the 3-bit gene from login_name through the function, login_name_gene=f(login_name), and directly locate the library through login_name_gene%8

e. User Center - Best Practices on the Operation Side

On the front-end user side, the business requirements are basically single-line record access. As long as the mapping relationship between the non-uid attributes login_name / phone / email to uid is established, the problem can be solved.

On the back-end operation side, business needs are different, and it is basically batch paging access. This type of access requires a large amount of computation, returns a large amount of data, and consumes database performance.

If the foreground business and the background business share a batch of services and a database at this time, it may lead to the "inefficient" access of the "batch query" of "a few requests" in the background, resulting in the cpu of the database occasionally being 100% instantaneously, Affects the access of normal users in the foreground (for example, login timeout).

In addition, in order to meet the requirements of various "strange shapes" of the background business, various indexes are often established on the database. These indexes occupy a large amount of memory, which will greatly reduce the query performance and write performance of the user-side foreground business uid/login_name. , the processing time increases.

For this type of business, the architecture scheme of "separation of front-end and back-end" should be adopted:

The front-end business requirements structure of the customer side remains unchanged, and the back-end business requirements of the product operation side are supported by extracting independent web/service/db to decouple the systems. For "complex business", "low concurrency" and "no need for high availability" Background business that "can accept a certain delay":

The service layer can be removed, and the web layer in the operation background can directly access the db through dao

No reverse proxy required, no cluster redundancy required

No need to access the real-time library, you can synchronize data asynchronously through MQ or offline

In the case of a very large database, an "external index" or "HIVE" design that is more suitable for a large amount of data and allows for higher latency can be used

f. Summary

Taking "user center" as a typical "single KEY" type of business, the architecture point of horizontal segmentation, this article makes some introductions.

Horizontal segmentation method: range method; hash method

Problems encountered after horizontal segmentation: the library can be directly located through the uid attribute query, but the library cannot be located through the non-uid attribute query

Typical business of non-uid attribute query: user side, foreground access, query of a single record, large number of visits, high availability of services, and high requirements for consistency; operation side, background access, according to product and operation requirements, The access modes are different, basically batch paging queries. Because it is an internal system, the access volume is very low, the requirements for availability are not high, and the requirements for consistency are not so strict.

The architectural design ideas of these two types of business:

For the user side, the architectural scheme of "establishing the mapping relationship between non-uid attributes and uid" should be adopted

For the operation side, the architecture scheme of "separation of the foreground and the background" should be adopted

On the front-end of the user, the best practice of "establishing the mapping relationship between non-uid attributes and uid":

Index table method: record the mapping relationship of login_name->uid in the database

Cache mapping method: record the mapping relationship of login_name->uid in the cache

login_name generates uid

login_name gene into uid

On the back-end side of operations, the best practice of "separation of front-end and back-end":

The front-end and back-end systems web/service/db are separated and decoupled to avoid the front-end query jitter caused by inefficient back-end queries

Data redundancy design can be adopted

"External index" (such as ES search system) or "big data processing" (such as HIVE) can be used to meet the abnormal query requirements in the background

3. 10 billion data 10,000 attribute data architecture design

1). What is the version + ext scheme of database extension?

Use ext to carry personalized attributes for different business requirements, and use version to identify the meaning of each field in ext.

Advantages: a. Properties can be dynamically expanded at any time, with good scalability; b. Both old and new data can exist at the same time, with good compatibility

Insufficient: the fields in a.ext cannot be indexed; the key values in b.ext have a lot of redundancy, it is recommended that the key be shorter

2). How to store different categories and heterogeneous data in a unified way, using a method similar to version+ext:

tiezi(tid,uid, time, title, cate, subcate, xxid, ext)

a. Some common fields are extracted and stored separately

b. Define what ext means by cate, subcate, xxid, etc. (a bit like version?)

c. Store the personalized needs of different business lines through ext

3). The storage problem of massive heterogeneous data has been solved, and the new problems encountered are:

a. The key in the ext of each record needs to be stored repeatedly, which occupies a lot of space. Can the storage be compressed?

b.cateid is no longer enough to describe the content in ext, the category has levels, the depth is uncertain, whether ext can be self-describing

c. Properties can be added at any time to ensure scalability

4). Unified category attribute service

A unified category and attribute service are abstracted to manage these information separately, and the json key in the ext field of the post library is uniformly represented by a number to reduce storage space.

The meaning of the number, which subcategory it belongs to, and the validation constraints of the value are all stored in the category and attribute services.

In addition, if the value of a key in ext is not a regular check value, but an enumeration value, there needs to be an enumeration table that defines the value for verification.

5). Unified search service

When the amount of data is large, the query requirements on different attributes cannot be met by combining indexes to meet all query requirements. What should we do?

The sages of 58.com have determined the technical route of "external index, unified retrieval service" from early on:

a. The database provides the forward query requirement of "post id"

b. All personalized retrieval needs that are not "post id", go to the external index uniformly

6). The operation of metadata and index data follows:

a. Perform tid forward query on the post and directly access the post service

b. Modify the post, the post service will notify the retrieval service, and the index will be modified at the same time

c. Perform complex queries on posts and meet the needs through retrieval services

4. Architecture scheme for smooth expansion of database in seconds

1). Deployment scheme:

a. Internet architecture with large concurrency and large traffic. Generally speaking, the upper layer of the database has a service layer. The service layer records the mapping relationship between "business database name" and "database instance", and routes SQL to the database through the database connection pool. statement to execute

b. As the amount of data increases, the data should be split horizontally, and the data should be distributed to different database instances (or even physical machines) after the database is divided to achieve the purpose of reducing the amount of data and enhancing performance.

c. The Internet architecture needs to ensure the high availability of the database. A common method is to use dual-master synchronization + keepalived + virtual ip to ensure the availability of the database

d. Combining the above (2) and (3), the actual online architecture has both horizontal segmentation and high availability guarantee

Question: What should I do if the amount of data continues to increase and the performance of the two libraries cannot be handled?

Answer: Continue to split horizontally, split into more libraries, reduce the amount of data in a single database, increase the number of instances (machines) of the main library, and improve performance.

2). Service suspension plan: Suspend all services and migrate data.

Rollback plan: If the data migration fails, or the test after migration fails, change the configuration back to the x library, restore the service, and post the announcement another day.

Program advantages: simple

Disadvantages of the scheme: a. Stop service, not high availability;

b. Technical students are under a lot of pressure, and all work must be completed within the specified time. According to experience, the more the pressure, the easier it is to make mistakes (this is fatal)

c. If there is a problem and it is not checked out at the first time, the service is started, and the problem is found after running for a period of time. It is difficult to roll back and needs to be rolled back, and some data may be lost.

3).Second level, smooth and handsome solution

a. Modify the configuration

There are two main changes:

a). The machine where the database instance is located has dual virtual ip. The original library with %2=0 is virtual ip0. Now add a virtual ip00. The same is true for another library with %2=1.

b). Modify the configuration of the service (whether in the configuration file or in the configuration center), change the database configuration of 2 libraries to the database configuration of 4 libraries, and pay attention to the mapping relationship between the old library and the hard work when modifying :

The library of %2=0 will become %4=0 and %4=2;

The part of %2=1 will become %4=1 and %4=3;

This modification is to ensure that the correct data can still be routed after splitting.

b.reload configuration, instance expansion

Service layer reload configuration, reload may be in the following ways:

a). Compare the original, restart the service, read the new configuration file

b). More advanced, the configuration center sends a signal to the service, rereads the configuration file, and reinitializes the database connection pool

Either way, after the reload, the instance expansion of the database is completed. It used to be 2 database instances to provide services, but now it becomes 4 database instances to provide services. This process can generally be completed in seconds.

The entire process can be restarted gradually, with no impact on the correctness and availability of the service:

a). Even if %2 database search and %4 database search exist at the same time, it does not affect the correctness of the data, because the dual master data is still synchronized at this time

b). The service does not provide external services before reload, and redundant services can ensure high availability

After completing the expansion of the instance, you will find that the amount of data in each database has not decreased, so the third step needs to do some finishing work

c. Finishing work, data shrinkage:

There are some finishing touches like these:

a). Modify the dual virtual ip back to the single virtual ip

b). Release the old dual-master synchronization, so that the data of the paired library will no longer increase synchronously

c). Add new dual-master synchronization to ensure high availability

d). Delete redundant data, for example: delete all data of %4=2 in ip0, and only provide services for data of %4=0

In this way, the data volume of each library is reduced to half of the original, and the data shrinkage is completed.

5. Smooth data migration of 10 billion data without affecting services

For many Internet business scenarios with "large data volume, large concurrency volume, and high business complexity", the

a. Changes to the underlying table structure

b. Transformation of the number of sub-libraries

c. Underlying storage medium transformation

There are two common solutions to complete the "smooth migration of data, no downtime during the migration process, and continuous system service".

1). Follow the log method, five steps:

a. The service is upgraded, and the log of "data modification on the old database" is recorded

b. Develop a data migration gadget for data migration

c. Develop a small tool for reading logs to equalize data differences

d. Develop a data comparison gadget to verify data consistency

e. The traffic is switched to the new library to complete the smooth migration

2). Double writing, four steps:

a. Upgrade the service, record "data modification on the old library" and double-write the new library

b. Develop a data migration gadget for data migration

c. Develop a data comparison gadget to verify data consistency

d. The traffic is switched to the new library to complete the smooth migration

6. Three schemes for MySQL redundant data

1). Why redundant data

For example: order business, there are order query requirements for users and merchants:

Order(oid, info_detail);

T(buyer_id, seller_id, oid);

If buyer_id is used to divide the database, the query of seller_id needs to scan multiple databases.

If seller_id is used to divide the database, the query of buyer_id needs to scan multiple databases.

At this point, data redundancy can be used to meet the query requirements on buyer_id and seller_id respectively:

T1(buyer_id, seller_id, oid)

T2(seller_id, buyer_id, oid)

There are two redundant copies of the same data. One is divided by buyer_id to meet the buyer's query needs; the other is divided by seller_id to meet the seller's query needs.

2). Service synchronous double write

As the name implies, redundant data is written synchronously by the service layer, as shown in Figure 1-4 above:

The business party calls the service and adds data

The service inserts T1 data first

Service then inserts T2 data

The service returns the business side to add data successfully

advantage:

Not complicated, the service layer is changed from a single write to two writes

The data consistency is relatively high (returned because the double write is successful)

shortcoming:

Increased processing time for requests (to insert twice, double the time)

The data may still be inconsistent. For example, if the service restarts after the second step of writing to T1 is completed, the data will not be written to T2.

3). Service asynchronous double write

The double write of data is no longer completed by the service. The service layer sends a message asynchronously, and sends it to a dedicated data replication service through the message bus to write redundant data, as shown in Figure 1-6 above:

The business party calls the service and adds data

The service inserts T1 data first

The service sends an asynchronous message to the message bus (you can send it, don't wait for it to return, it usually completes quickly)

The service returns the business side to add data successfully

The message bus delivers the message to the data synchronization center

Data synchronization center inserts T2 data

Advantages: short request processing time (only 1 insertion)

Disadvantages: The complexity of the system increases, and one more component (message bus) and one service (dedicated data replication service) are introduced.

Because when the returned line of business data is successfully inserted, the data is not necessarily inserted into T2, so the data has an inconsistent time window (this window is very short and is eventually consistent)

Redundant table data is inconsistent when messages are lost on the message bus

Whether it is synchronous double writing of services or asynchronous double writing of services, services need to pay attention to the complexity brought by "redundant data". If you want to decouple the system from "data redundancy", a third commonly used solution is introduced.

If the system is more sensitive to processing time, a second solution is commonly used.

4). Offline asynchronous double write:

In order to shield the complexity of "redundant data" on the service, the double writing of data is no longer completed by the service layer, but by an offline service or task, as shown in Figure 1-6 above:

The business party calls the service and adds data

The service inserts T1 data first

The service returns the business side to add data successfully

The data will be written to the log of the database

Offline services or tasks read the log of the database

Insert T2 data into offline services or tasks

Advantages: data double writing is completely decoupled from business; request processing time is short (only 1 insertion)

Disadvantage: When the return line of business data is successfully inserted, the data may not be inserted into T2, so the data has an inconsistent time window (this window is very short and is eventually consistent)

Data consistency depends on the reliability of offline services or tasks

5). Summary:

Business scenarios with a large amount of Internet data often:

Use horizontal segmentation to reduce the amount of data in a single database

Use the anti-paradigm design of data redundancy to meet the query requirements of different dimensions

Data redundancy can be easily implemented using the service synchronization double-write method

In order to reduce the delay, it can be optimized to the asynchronous double writing method of the service

In order to shield the complexity of "redundant data" for services, it can be optimized to offline asynchronous double writing

The content is transferred from the WeChat public account: the road of architects

Database schema (transfer)

Guess you like