MySQL query specification and design

First, the database command specification

·  All database object names must be lowercase letters and underlined division

·  All mysql database object names prohibit the use of reserved keywords (if the table name contains the keyword query, it needs to be in single quotes)

·  Name the database objects to be able to see the name recognition is intended to do, and do not last more than 32 characters

·  Temporary database table must tmp_ prefixed and suffixed with the date, the backup table must be prefixed with bak_ and date (time stamp) suffix

*  All data is stored in the same column names and types must be the same (as a general association columns, column type association would be inconsistent if the query automatic data type implicit conversion will result in failure of the index on the column, leading to a decrease in query efficiency)

Second, the basic database design specifications

1. All tables must use Innodb storage engine

No special requirements (ie Innodb not meet features such as: column storage, data storage, etc.) in case, all tables must use Innodb storage engine (default Myisam before mysql5.5, after 5.6 as the default Innodb) Innodb support transactions , supports row-level locking, better recovery, better performance under high concurrency.

2, database and table character set consistent use UTF-8

More compatible, unified character set can be avoided due to the garbled character set conversion produced a different character set before comparison requires a conversion will result in the failure index

3, all the tables and fields need to add a comment

Use the comment clause to add tables and columns of notes from the outset to maintain the data dictionary

4, try to control the amount of data the size of a single table, recommended control in less than five million

5 million limit is not MySQL database, modify table structure over the General Assembly cause, backup, recovery will have a big problem

It can be archived (used in log data) with historical data, sub-library sub-table (applies to business data) and other means to control the amount of data

5, careful use of MySQL partition table

Partition table on a physical performance as multiple files, the performance of a careful selection table logically partition key, partition query efficiency may be lower across the proposed manner of physical separation of the management of large data tables

6, as far as possible separation of hot and cold data, reducing the width of the table

When MySQL limit Each table storage 4096, and the size of each row of data can not exceed the IO 65535 bytes reduce disk, memory cache hit ratio to ensure thermal data (the wider the table, the table is loaded into memory buffer pool memory occupied greater, will consume more IO) more efficient use of the cache, the read column avoid useless data is often used with a cold into a table (to avoid further association operation)

7, prohibits the establishment of the reserved field in the table

Reserved field naming is very difficult to see justice reserved field name identification can not be confirmed data type is stored, it is impossible to choose the right type of modification to the reservation field types, the table will be locked

8, prohibit large binary data to store pictures, documents in a database

Usually file is large, the amount of data in a short time can cause the rapid growth of the database when the database is read, usually a lot of random IO operations, when the file is large, time-consuming operation IO usually stored in a file server, a database stores only address information file

9, ban on online database to do stress tests

10, banned from the development environment, the test environment is directly connected generation database environment

Third, the design specification database fields

1, the preferred data type matches the minimum required storage

 · the reason

The larger columns field, the greater the space required for indexing, the index number of nodes can be stored in such a page the less the less, the more the number of IO while traversing needed the performance index also worse

 · Methods

1) to convert a string into a digital type memory, such as: converting the IP address into integer data.

mysql provides two methods to handle ip:

inet_aton ip into the unsigned integer (4-8)

inet_ntoa into the integer ip address

Before inserting the data, first with inet_aton the ip address into an integer, space can be saved. When the display data, using the integer inet_ntoa ip address into the address can be displayed.

2) For non-negative data (e.g., ID incremented, Integer IP), the priority is stored using unsigned integer

Because: Unsigned relative symbol may have more than double the storage space

  SIGNED INT -214748648~2147489647

  UNSIGNED INT 0~4294967295

VARCHAR (N) of the N represents the number of characters, not the number of bytes

255 characters stored in UTF8 Varchar (255) = 765 bytes. Excessive length will consume more memory

2, avoid the use of TEXT, BLOB data types, the most common type of TEXT data can be stored in the 64k

·  Recommended to separate into separate extension table BLOB or TEXT columns

Mysql memory temporary table does not support TEXT, BLOB data types such as large, if the query contains such data, when sorting and other operations, you can not use memory temporary tables, you must use a temporary disk tables.

And for this data, Mysql or to secondary query, sql performance will become poor, but not saying that we can not use such data types.

If you must use, it is recommended to separate BLOB or TEXT columns to a separate extension table, must not use the query select * only need to remove the necessary columns, the columns do not need to query the data TEXT columns.

·  TEXT or BLOB type can only use the prefix index

Because MySQL index field length is limited, so only use TEXT types prefix index, and the TEXT columns can not have default values.

3, avoid the use of ENUM type

·  Modify ENUM, use an ALTER statement

·  Low ENUM type ORDER BY operations efficiency, the need for additional operations

·  Prohibit the use of value as ENUM enumeration value

4, as far as possible all columns defined as NOT NULL

the reason:

·  Index NULL columns require additional space to keep them, so to take up more space;

·  To do a special deal for NULL values are compared and calculated

5, using TIMESTAMP (4 bytes) or DATETIME type (8 bytes) storage time

TIMESTAMP time stored 1970-01-01 00:00:01 ~ 2038-01-19-03: 14: 07

TIMESTAMP 4 bytes and occupies the same INT, INT but higher than the readability

TIMESTAMP beyond the range of the type of storage used DATETIME.

Often someone with a string data type to store date (not the right way):

·  One disadvantage: can not be calculated and compared with date function

·  Shortcomings 2: string stored date to take up more space

6, with the amount of finance-related classes must use decimal data types

*  Non-precision floating point: float, double

·  Precision floating point: decimal

Decimal precision floating-point type, in the calculation will not lose precision. Space determined by the width defined, may be stored every four bytes 9 digits and the decimal point to one byte. Bigint greater than can be used for storing integer data.

Fourth, the index design specifications

1, limit the number of indexes on each table, a single table indexes do not recommend more than five

The index is not better! Indexes can improve the efficiency can also reduce efficiency.

Indexes can increase query efficiency, but will also reduce the efficiency of insert and update, and even in some cases will reduce the query efficiency.

Because mysql optimizer when choosing how to optimize queries, based unified messaging, each index can be used to evaluate, to generate a best execution plan, if there are multiple indexes at the same time it can be used to query, time will increase mysql optimizer generates the execution plan, it will also reduce query performance.

2, prohibit to each column in the table have established separate index

Prior to version 5.6, a sql can only use an index to a table, good after 5.6, even with the combined index optimized way, but still far from a joint use of the index query

3, each Innodb table must have a primary key

Innodb is an index organized tables: logical order and the order index stored data is the same.

Each table may have multiple indexes, but only stores the order table in the order that there is a Innodb the primary key index organized tables.

Do not use frequently updated column as the primary key is not applicable to multi-column primary key (equivalent to the joint index) Do not use UUID, MD5, HASH, string column as the primary key (not guarantee growth in the order of data).

Recommended increment primary key ID value.

Fifth, the common index column suggestions

·  Appear in the WHERE clause of SELECT, UPDATE, DELETE statements in the column

·  Field contains ORDER BY, GROUP BY, DISTINCT in

Does not want to meet the column fields 1 and 2 have established an index, usually the better to establish a joint index fields 1 and 2

·  Multi-table join the association column

6, how to select the index column order

The purpose of indexing is: I hope that through the index to find data, to reduce random IO, increase query performance, the index can filter out the less data from the disk to read the data the less.

·  Discrimination on the highest combined leftmost index (= number of rows in the discrimination number of different values of the column / column);

·  As far as possible on a small field length of the column of the left-most joint index (because the smaller field length, a larger amount of data can be stored, the better the performance of the IO);

·  Most frequently used in the left column into the joint index (so you can establish a relatively small number of indexes).

Seven, avoid creating redundancy and duplication index index

As this will increase the time the query optimizer generates an execution plan.

Repeat index Example: Primary Key (the above mentioned id), index (the above mentioned id), UNIQUE index (the above mentioned id)

· Example redundancy index: index (A, B, C), index (A, B), index (A)

Eight priority coverage index

For frequent queries priority to use a covering index.

A covering index: that contains all the query fields (where, select, ordery by, group by field included) index

Covering index benefits:

·  Avoid Innodb secondary index table query

Innodb clustered index is stored in order, for Innodb, the two indexes stored in the leaf node is the primary key of the information line,

If you are using the secondary index to query the data, then after finding the appropriate key, we have to get real data needed to conduct the second query by the primary key. In the cover index, the two key values ​​of the index may be acquired all the data, avoiding a secondary key for the primary query, reducing IO operations, improve search efficiency.

·  Can become random IO IO order to accelerate query efficiency

Since the covering index is stored in order of the key values, for a range lookup for intensive IO, IO comparative random access data of each line to be much less from the disk, so the use of covering indexes can also access the disk at random IO IO read into the order of the index lookup.

Nine, SET index specification

Avoid using foreign key constraints

·  Do not recommend the use of foreign key constraints (foreign key), but it must be indexed on the associated bond between the table and the table;

·  Foreign key may be used to ensure the referential integrity of data, but it is recommended to achieve the business end;

·  Foreign keys will affect the parent and child tables write operation thereby reducing performance.

Ten, SQL database development specifications

1, it is recommended to use a prepared statement for database operations

Precompiled statements can reuse plan, reducing the time required to compile SQL, dynamic SQL can also solve the problem caused by SQL injection only pass parameters, pass SQL statements and more efficient than the same statement can be parsed once, use many times, improve processing efficiency.

2, avoid data type implicit conversion

Implicit conversion can lead to failure of the index. Such as: select name, phone from customer where id = '111';

3, full use of the existing table index

·  Avoid using the query number of double%.

As a like '% 123%', (% if no front, rear% only, can be used in the index on column)

·  A SQL can use to perform a range query in the composite index

Such as: a joint index, B, column c, there are listed a range query in the query conditions, then at B, the c-column index will not be used, when defining the joint index, if the column to be a used to find the range, then you should put the right side of a joint index of the column.

Use left join or not exists to optimize not in operation

Because not in use indexes also often fail.

4, database design, should be considered for future expansion

5, to connect different databases use different account, hex cross-database query

·  Leave room for the library database migration and sub-sub-table

·  Reduce the degree of coupling business

·  Avoid excessive rights arising from security risks

6, prohibit the use of SELECT * You must use SELECT <field list> query

the reason:

·  Consume more CPU and IO resources to network bandwidth

·  Can not use a covering index

·  To reduce the impact of the change table structure

7, prohibit the use of the INSERT statement does not contain a list of fields

如:insert into values ('a','b','c');

Use insert into t (c1, c2, c3) values ​​( 'a', 'b', 'c');

8, avoid the use of sub-queries, you can handle the query optimizer to join operations

Normally in sub-clause in the query, and the subquery is a simple SQL (does not include union, group by, order by, limit clause), it can be converted to handle inquiries related query optimization.

Subquery causes of poor performance:

·  The result set of subquery can not use the index, the result is usually sub-query sets are stored in a temporary table, whether temporary table or disk memory temporary tables will not exist indexes, query performance will be affected to some degree;

·  Especially for the return result sets relatively large sub-queries, the greater its impact on query performance;

·  Since the sub-query will produce a lot of temporary table has no index, it will consume excessive CPU and IO resources, resulting in a large number of slow queries.

9, avoid using too much JOIN association table

For Mysql, is the presence of associative cache, the cache size may be set by the parameter join_buffer_size.

In Mysql, for the same SQL multi associate (join) a table, it will allocate a multi-associative cache, SQL, if in a more associated table, the greater the amount of memory.

If the program is in a lot of use associated with the operation of multi-table, while in the case of unreasonable join_buffer_size set, it is easy to cause the server memory overflow, it will affect the stability of the server database performance.

At the same time for the association operation, the operation will produce a temporary table, Mysql query efficiency impact associated with a maximum of 61 tables, not recommended over five.

10, to reduce the number of interactions with the database

Together, the processing efficiency can be improved more suitable database processing operations into a plurality of the same batch operation

11, when the judgment corresponding to the same column or used in place of or in

in the value of not more than 500 in operation more efficient use of the index, or in most cases rarely make use of the index.

12, prohibiting the use of order by rand () random sequencing

Table will load all qualified data into memory, and then sorted according to the randomly generated value for all data in memory, and each row may have a randomly generated value, if the condition of the data set is very large, it will consume a lot of CPU and IO and memory resources.

Recommend obtaining a random value in the program, and obtaining data from the database in such a way

13, WHERE clause prohibited functions to convert the column and calculated

It will cause the index can not be used when the column is a function of conversion or calculation.

 · Not recommended:

where date(create_time)='20190101'

 

 · Recommended:

where create_time >= '20190101' and create_time < '20190102'

 

14, when using UNION ALL obviously not duplicate values ​​instead UNION

·  For all of the data de-duplication UNION will operate two result sets into a temporary table and then

·  UNION ALL will not be the result set to re-operate

15, split large, complex SQL into several small SQL

*  Large SQL: complex logic, CPU-intensive calculation of SQL

·  MySQL: a SQL can use only one CPU is calculated

·  SQL This split can improve the processing efficiency through parallel execution

XI, database operation code of conduct

1, over 100 million lines of batch write (UPDATE, DELETE, INSERT) operation, to operate multiple batches

·  High-volume operation may cause serious delays from the master

Master-slave environment, high-volume operation may cause serious master-slave latency, high-volume writes are generally required to perform a certain length of time, and only after the completion of the implementation of the main library, will be performed from the library in the other, so It will cause delays from the main library and the library long

·  Will generate a lot of logs for the log when binlog row format

High-volume writes will produce a large log, especially for the row of binary data format, since the row format will be recorded modify each row of data, the more time we modify the data, the amount of generated log also will be more, time transmission and recovery logs needed for a longer, which is caused by a primary reason for the delay.

·  Avoid large transaction operations

Modify large quantities of data must be carried out in a single transaction, which will result in large amounts of data in the table is locked, thereby causing significant obstruction, a very large impact on the performance of MySQL would be blocked.

Especially long-time blocked the connection will fill all the available databases, which makes the production of other application environments can not connect to the database, so be sure to pay attention to high-volume write operation to be performed in batches.

2, for large tables using pt-online-schema-change modified table structure

·  Main avoid large table modification resulting from delays

·  Avoid lock table to table in the field to be modified

Large table to modify data structures must be careful, cause serious lock table operations, especially in the production environment, can not be tolerated.

pt-online-schema-change it will first establish a same original table structure of the new table, and modified table structure in the new table, and then copy data of the original table to the new table, and original table add some triggers.

The original table in the new data is also copied to the new table, row after copying all the data is completed, the new table to the original table, and delete the original table.

The original a DDL operations into multiple smaller batches.

3, super prohibit giving permission for the account used by the program

When the maximum number of connections limit, has also run a super user privileges connected super privileges can only be left to deal with the problem using a DBA account.

4, procedures for connecting to the database account, follow the principle of minimum rights

Program uses a database account can only be used in a DB, in principle, quasi account are not allowed to cross-database used by the program have drop privileges.

 

Guess you like

Origin www.cnblogs.com/dylan402/p/11301286.html