Mysql basic principles and concepts


I. Introduction

With the widespread popularity of Internet applications, the storage and access of massive data has become a bottleneck in system design. For a large-scale Internet application, billions of PVs per day undoubtedly cause quite high load on the database. It has caused great problems to the stability and scalability of the system. To improve website performance through data segmentation, horizontal scaling of the data layer has become the preferred method for architecture developers. Slicing the database horizontally can reduce the load of a single machine and minimize the loss caused by downtime. Through the load balancing strategy, the access load of a single machine is effectively reduced and the possibility of downtime is reduced; through the cluster solution, the problem of single-point database inaccessibility caused by database downtime is solved; through the read-write separation strategy It maximizes the speed and concurrency of reading (Read) data in the application. At present, a large number of large-scale Internet applications in China have adopted such data segmentation schemes, Taobao, Alibaba, and Tencent, and most of them have implemented their own distributed data access layer (DDAL). Divided by the implementation method and implementation level, it is roughly divided into two levels (Java application as an example): the encapsulation of the JDBC layer and the implementation of the ORM framework layer. The following Brothers Education (www.lampbrother.net

) will analyze the basic principles and concepts



2. Basic principles and concepts

2.1 Basic principles:

  The process of human cognitive problems is always like this: what (what)-? why-? how (how to do it), next, this article will discuss and study these three issues:

2.1.1 What is data segmentation

The word "Shard" means "shard" in English, and as a technical term related to databases, it seems to be first seen in MMORPGs. "Sharding" is called "sharding". Sharding is not a new technology, but a relatively simple software concept. As we all know, the data table partition function was only available after MySQL 5. Before that, many potential users of MySQL had concerns about the scalability of MySQL, and whether or not to have the partition function became a measure of the scalability of a database. Key metrics (certainly not the only ones). Database scalability is an eternal topic, and MySQL promoters are often asked: how to do such processing as it is difficult to process application data on a single database and needs to be partitioned? The answer is: Sharding. Sharding is not a function attached to a specific database software, but an abstract process on top of specific technical details. It is a solution for horizontal scaling (ScaleOut, or horizontal scaling and outward scaling). The I/O capability of the node database server is limited to solve the problem of database scalability.

  The data is horizontally distributed to different DBs or tables through a series of segmentation rules, and the specific DBs or tables that need to be queried are found through the corresponding DB routing or table routing rules to perform Query operations. The "sharding" mentioned here usually refers to "horizontal slicing", which is also the focus of this article. What kind of segmentation method and routing method will there be? At this point, readers will inevitably have doubts. Let’s take a simple example: Let’s illustrate the log in a Blog application. For example, the log article table has the following fields:

article_id(int), title(varchar(128) ), content(varchar(1024)), user_id(int)

  Faced with such a table, how do we divide it? How to distribute such data to tables in different databases? In fact, analyzing the application of blog, it is not difficult for us to draw such a conclusion: in the application of blog, users are divided into two types: viewers and blog owners. When a browser browses a blog, it actually browses under the blog of a specific user, and the owner of the blog manages his own blog and also operates under the blog of a specific user (in his own space). . The so-called specific user is represented by a field in the database as "user_id". It is this "user_id", which is the basis of the sub-library and the basis of the rules we need. We can do this, put all the article information with user_id 1~10000 into the article table in DB1, put all the article information with user_id 10001~20000 into the article table in DB2, and so on, until DBn. In this way, the article data is naturally divided into various databases to achieve the purpose of data segmentation. The next problem to be solved is how to find the specific database? In fact, the problem is also simple and obvious. Since we use the distinguishing field user_id when sub-database, it is natural that the process of database routing is of course indispensable to user_id. Consider the blog application we just presented, whether it is to access other people's blogs or manage my own blog, in short, I have to know who the user of this blog is, that is, we know the user_id of this blog, use this user_id, use The rules for sub-database, in turn, locate the specific database. For example, if the user_id is 234, if the user_id is 234, it should be located in DB1. If the user_id is 12343, if the user_id is 12343, it should be located in DB2 by using the rule of this talent. By analogy, using the rules of sub-database, reverse routing to a specific DB, we call this process "DB routing".

  Of course, the DB design considering data segmentation must be an unconventional and unorthodox DB design. So what kind of DB design is the orthodox DB design?

  We use the basic rules and regulations. Usually, we will consciously design our database according to the paradigm. At high load points, we may consider using the relevant Replication mechanism to improve the throughput and performance of reading and writing. This may already meet many needs, but the shortcomings of this mechanism are still relatively obvious. (mentioned below). The above-mentioned "consciously designed according to the paradigm". Considering the DB design of data segmentation, it will violate the usual rules and constraints. In order to segment, we have to appear redundant fields in the database tables, which are used as distinguishing fields or marked fields called sub-databases, such as the above article The field of user_id in the example of . Of course, the appearance of redundant fields does not only appear in the scenario of sub-database. In many large-scale applications, redundancy is also necessary, which involves the design of efficient DB.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326446498&siteId=291194637