msql partition

1. What is table partition
In layman's terms table partition is to divide a large table into several small tables according to conditions. MySQL5.1 began to support data table partitioning.
For example, if there are more than 6 million records in a user table, the table can be partitioned according to the storage date, or the table can be partitioned according to the location. Of course, it can also be divided according to other conditions.

Second, why to partition the table
In order to improve the scalability of large tables and tables with various access patterns, manageability and improve database efficiency.
Some of the advantages of partitions include:
      1) More data can be stored than a single disk or file system partition.
      2) For those data that have lost the meaning of preservation, it is usually easy to delete those data by deleting the partitions related to those data. Conversely, in some cases, the process of adding new data can be conveniently implemented by adding a new partition specifically for that new data. Other advantages generally associated with partitioning include those listed below. These features in MySQL partitioning are not currently implemented, but are high on our priority list; we hope to include these features in the production release of 5.1.
      3), some queries can be greatly optimized, mainly because the data that satisfies a given WHERE statement can only be stored in one or more partitions, so that there is no need to find other remaining partitions when searching. Because partitions can be modified after a partitioned table is created, data can be reorganized to improve the efficiency of commonly used queries when the partitioning scheme is first configured without doing so.
      4) Queries involving aggregate functions such as SUM() and COUNT() can be easily processed in parallel. A simple example of such a query is "SELECT salesperson_id, COUNT (orders) as order_total FROM sales GROUP BY salesperson_id;". By "parallel", it means that the query can be performed on each partition concurrently, and the final result is simply obtained by summing up all the partitions.
      5) Obtain greater query throughput by dispersing data queries across multiple disks.

3. Partition Type

· RANGE Partition: Assign multiple rows to a partition based on column values ​​belonging to a given continuous interval.
· LIST partitioning: Similar to partitioning by RANGE, the difference is that LIST partitioning is based on the column value matching a value in a discrete value set to select.
HASH partitioning: Partitioning that is selected based on the return value of a user-defined expression computed using the column values ​​of the rows to be inserted into the table. This function can contain any expression valid in MySQL that yields a non-negative integer value.
· KEY partition: similar to partition by HASH, the difference is that KEY partition only supports calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.
RANGE partitioning
       assigns multiple rows to a partition based on column values ​​that belong to a given contiguous range.
       These intervals must be contiguous and cannot overlap each other, and are defined using the VALUES LESS THAN operator. The following are examples.

Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT NOT NULL, 
    store_id INT NOT NULL 

 
partition BY RANGE (store_id) ( 
    partition p0 VALUES LESS THAN (6), 
    partition p1 VALUES LESS THAN (11), 
    partition p2 VALUES LESS THAN (16), 
    partition p3 VALUES LESS THAN (21) 
);  
       according to this partitioning scheme, in the store All rows corresponding to employees working 1 to 5 are kept in partition P0, employees 6 to 10 of stores are kept in P1, and so on. Note that each partition is defined in order, from lowest to highest. This is a requirement of the PARTITION BY RANGE syntax; in this respect it is similar to a "switch ... case" statement in C or Java.
       For a new row containing data (72, 'Michael', 'Widenius', '1998-06-25', NULL, 13), it can be easily determined that it will be inserted into the p2 partition, but if a number is added For the 21st store, what will happen? Under this scheme, since there is no rule to include stores with store_id greater than 20, the server will not know where to save the row and will cause an error. This error can be avoided by using a "catchall" VALUES LESS THAN clause in the CREATE TABLE statement that provides all values ​​greater than an explicitly specified highest value:
Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT NOT NULL, 
    store_id INT NOT NULL 

 
PARTITION BY RANGE (store_id) ( 
    PARTITION p0 VALUES LESS THAN (6), 
    PARTITION p1 VALUES LESS THAN (11), 
    PARTITION p2 VALUES LESS THAN (16), 
    PARTITION p3 VALUES LESS THAN MAXVALUE 
); 
    MAXVALUE represents the largest possible integer value. Now, all rows with a store_id column value greater than or equal to 16 (the highest value defined) will be stored in partition p3. At some point in the future, when the number of stores has grown to 25, 30, or more, an ALTER TABLE statement can be used to add new partitions for stores 21-25, 26-30, and so on.
     In almost the same structure, you can also split the table based on the employee's job code, that is, on a continuous interval based on the value of the job_code column. For example - assuming a 2-digit work code is used for general (in-store) workers, a three-digit code for office and support staff, and a four-digit code for management, you can create this partition table using the following statement:
Sql Code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT NOT NULL, 
    store_id INT NOT NULL 

 
PARTITION BY RANGE (job_code) ( 
    PARTITION p0 VALUES LESS THAN (100), 
    PARTITION p1 VALUES LESS THAN (1000), 
    PARTITION p2 VALUES LESS THAN (10000) 
); 

in this example, all rows related to workers in the store will be saved in partition p0 , all rows related to office and support staff are kept in partition p1, and all rows related to management are kept in partition p2.
       It is also possible to use an expression in the VALUES LESS THAN clause. The most notable limitation here is that MySQL must be able to evaluate the return value of an expression as part of a LESS THAN (<) comparison; therefore, the value of an expression cannot be NULL. For this reason, the hired, separated, job_code, and store_id columns of the employee table have been defined as NOT NULL.
       In addition to splitting table data based on store number, you can also split table data using an expression based on one of two DATEs. For example, suppose you want to split the table based on the year each employee left the company, that is, the value of YEAR(separated). An example of a CREATE TABLE statement implementing this partitioning mode is shown below:
Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT, 
    store_id INT 

 
PARTITION BY RANGE (YEAR(separated)) ( 
    PARTITION p0 VALUES LESS THAN (1991) , 
    PARTITION p1 VALUES LESS THAN (1996), 
    PARTITION p2 VALUES LESS THAN (2001), 
    PARTITION p3 VALUES LESS THAN MAXVALUE 
); 
in this scheme, records of all employees hired before 1991 are kept in partition p0, 1991 Records of all employees hired between 1995 and 1995 are kept in partition p1, records of all employees hired from 1996 to 2000 are kept in partition p2, and information of all workers hired after 2000 are kept in p3.
RANGE partitions are especially useful in the following situations:
      1) When you need to delete the "old" data on a partition, just delete the partition. If you used the partitioning scheme given in the most recent example above, you would simply use "ALTER TABLE employees DROP PARTITION p0;" to delete all rows corresponding to all employees who stopped working before 1991. For tables with a large number of rows, this is much more efficient than running a DELETE query such as "DELETE FROM employees WHERE YEAR (separated) <= 1990;".
      2), want to use a column that contains date or time values, or values ​​that start to grow from some other series.
      3) Frequently run queries that directly depend on the columns used to split the table. For example, when executing a query such as "SELECT COUNT(*) FROM employees WHERE YEAR(separated) = 2000 GROUP BY store_id;", MySQL can quickly determine that only partition p2 needs to be scanned because the remaining partitions are not May contain any records that match the WHERE clause.
Note: This optimization is not yet enabled in the MySQL 5.1 source code, however, work is in progress.
LIST partitioning
      is similar to partitioning by RANGE, except that LIST partitioning selects based on a column value matching a value in a set of discrete values.
      LIST partitioning is achieved by using "PARTITION BY LIST(expr)", where "expr" is a column value or an expression based on a column value that returns an integer value, and then passed the "VALUES IN (value_list)" way to define each partition, where "value_list" is a comma-separated list of integers.
Note: In MySQL 5.1, when using LIST partitioning, it is possible to match only lists of integers.
Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT, 
    store_id INT 
); 

Suppose there are 20 audio-visual stores, distributed in 4 regions with distribution rights, as shown in the following table:
=====================
Regional stores ID No
.------------------------------------
North 3, 5, 6, 9, 17
East 1, 2, 10, 11, 19, 20
West 4, 12, 13, 14, 18
Central 7, 8, 15, 16
====================
To split a table in such a way that rows belonging to the same regional store are kept in the same partition, the following "CREATE TABLE" statement can be used:
Sql代码 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT, 
    store_id INT 

 
PARTITION BY LIST(store_id) 
    PARTITION pNorth VALUES IN (3,5,6,9,17), 
    PARTITION pEast VALUES IN (1,2,10,11,19,20), 
    PARTITION pWest VALUES IN (4,12,13,14,18), 
    PARTITION pCentral VALUES IN (7,8,15,16) 
); 

This makes it easy to add or delete employee records for specified regions from the table. For example, assume that all video stores in the West End are sold to other companies. Then all records (rows) related to employees working in a video store in the West can be deleted using the query "ALTER TABLE employees DROP PARTITION pWest;", which has the same effect as the DELETE query "DELETE query DELETE FROM employees WHERE store_id" IN (4,12,13,14,18);" is much more efficient.
Important: If you try to insert a row whose column value (or the return value of a partition expression) is not in the list of partition values, the "INSERT" query will fail with an error. For example, assuming the above scheme is used for the LIST partition, the following query will fail:
Sql code 
INSERT INTO employees VALUES(224, 'Linus', 'Torvalds', '2002-05-01', '2004-10-12', 42, 21); 

this is because the "store_id" column value 21 cannot be found in the list of values ​​used to define the partitions pNorth, pEast, pWest, or pCentral. It is important to note that the LIST partition does not have a definition like "VALUES LESS THAN MAXVALUE" that includes other values. Any value that will be matched must be found in the list of values.

In addition to LIST partitions that can be combined with RANGE partitions to generate a composite subpartition, it is also possible to combine HASH and KEY partitions to generate composite subpartitions.
HASH partitions        
       are selected based on the return value of a user-defined expression computed using the column values ​​of the rows to be inserted into the table. This function can contain any expression valid in MySQL that yields a non-negative integer value.
      To partition a table using HASH partitioning, add a "PARTITION BY HASH (expr)" clause to the CREATE TABLE statement, where "expr" is an expression that returns an integer. It can simply be the name of a column whose field type is MySQL Integer. In addition, you will most likely need to add an additional "PARTITIONS num" clause, where num is a non-negative integer that represents the number of partitions the table will be divided into.
Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT, 
    store_id INT 

PARTITION BY HASH(store_id) 
PARTITIONS 4; 
if a PARTITIONS clause is not included, the number of partitions will default to 1. Exception: For NDB Cluster tables, the default number of partitions will be the same as the number of cluster data nodes,
this correction may be to account for any MAX_ROWS setting to ensure that all rows are properly inserted into the partitions.
LINER HASH
MySQL also supports linear hashing, which differs from regular hashing in that linear hashing uses a linear powers-of-two algorithm, while regular hashing uses a hash function The modulus of the value.
The only syntax difference between linear hash partitioning and regular hash partitioning is the addition of the "LINEAR" keyword to the "PARTITION BY" clause.
Sql code 
CREATE TABLE employees ( 
    id INT NOT NULL, 
    fname VARCHAR(30), 
    lname VARCHAR(30), 
    hired DATE NOT NULL DEFAULT '1970-01-01', 
    separated DATE NOT NULL DEFAULT '9999-12-31', 
    job_code INT, 
    store_id INT 

PARTITION BY LINEAR HASH(YEAR(hired)) 
PARTITIONS 4; 

Assuming an expression expr, when using the linear hash function, the partition to which the record will be saved is partition N of num partitions, where N is According to the following algorithm:
1. Find the next power of 2 greater than num. We call this value V , which can be obtained by the following formula:
2. V = POWER(2, CEILING(LOG(2, num)))
(For example, suppose num is 13. Then LOG(2,13) ​​is 3.7004397181411. CEILING(3.7004397181411) is 4, then V = POWER(2,4), which equals 16).
3. Set N = F(column_list) & (V - 1).
4. When N >= num:
Set V = CEIL(V / 2)
Set N = N & (V - 1)
For example, suppose table t1 , using linear hash partitioning with 4 partitions, was created with the following statement:
CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
    PARTITION BY LINEAR HASH( YEAR(col3) )
    PARTITIONS 6;
now Suppose you want to insert two rows of records into table t1, one of which has a col3 column value of '2003-04-14' and another record with a col3 column value of '1998-10-19'. The partition to which the first record will be saved is determined as follows:
V = POWER(2, CEILING(LOG(2,7))) = 8
N = YEAR('2003-04-14') & (8 - 1)
   = 2003 & 7
   = 3
(3 >= 6 is FALSE: the record will be saved to partition #3)
The partition number to which the second record will be saved is calculated as follows:
V = 8
N = YEAR('1998-10-19') & (8-1)
  = 1998 & 7
  = 6
(6 >= 4 is TRUE: additional steps are required)
N = 6 & CEILING(5 / 2 )
  = 6 & 3
  = 2

(2 >= 4 is FALSE: records will be saved to partition #2)
The advantage of partitioning by linear hash is that adding, deleting, merging and splitting partitions will be much faster , which is good for tables with extremely large amounts (1000 gigabytes) of data. Its disadvantage is that the distribution of data across partitions is unlikely to be balanced compared to the data distribution obtained using
conventional HASH partitions.
KSY partitioning
is similar to partitioning by HASH, the difference is that KEY partitioning only supports the calculation of one or more columns, and the MySQL server provides its own hash function. One or more columns must contain integer values.
Sql code 
CREATE TABLE tk ( 
    col1 INT NOT NULL, 
    col2 CHAR(5), 
    col3 DATE 

PARTITION BY LINEAR KEY (col1) 
PARTITIONS 3; 

using the keyword LINEAR in the KEY partition has the same effect as in the HASH partition, the partition The numbering is obtained by the powers-of-two algorithm, not by the modulus algorithm.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326577463&siteId=291194637