Essential knowledge for database design. This is the cornerstone of future advancement

table of Contents

Essential knowledge for database design. This is the cornerstone of future advancement

1. Database command specification

Second, the basic design specifications of the database

Three, database field design specifications

Four, index design specifications

Five, common index column recommendations

Six, how to choose the order of index columns

Seven, avoid creating redundant indexes and duplicate indexes

Eight, give priority to covering indexes

Nine, index SET specification

10. Database SQL development specifications

11. Code of Conduct for Database Operation

1. Database command specification
· All database object names must use lowercase letters and be separated by underscores

· All database object names are prohibited from using MySQL reserved keywords (if the table name contains keywords for query, you need to enclose it in single quotes)

· The naming of database objects should be able to be recognized by their names, and the final number should not exceed 32 characters

· Temporary database tables must be prefixed with tmp_ and suffixed with date, and backup tables must be prefixed with bak_ and suffixed with date (time stamp)

· All column names and column types that store the same data must be consistent (usually as associated columns, if the associated column types are inconsistent during query, implicit conversion of data types will be performed automatically, which will cause the index on the column to become invalid and reduce query efficiency)

2. Basic database design specifications
1. All tables must use the Innodb storage engine.
There are no special requirements (that is, functions that Innodb cannot meet, such as column storage, storage space data, etc.), all tables must use the Innodb storage engine (mysql5.5 Myisam was used by default before, and Innodb is the default after 5.6) Innodb supports transactions, supports row-level locks, better recoverability, and better performance under high concurrency
2. The database and table character sets use UTF8 uniformly, which has better compatibility. The unified character set can avoid garbled characters due to character set conversion. The conversion of different character sets before comparison will cause index failure.
3. All tables and fields need to add comments. Use the comment clause to add table and column notes from the beginning Just maintain the data dictionary.
4. Try to control the size of the single table data as much as possible. It is recommended to control it within 5 million. 5 million is not a limitation of the MySQL database. If it is too large, it will cause big problems to modify the table structure, backup and restore. Use historical data archiving (applied to log data), sub-database and sub-table (applied to business data) to control the amount of data.
5. Use MySQL partition table with caution. The
partition table is physically represented as multiple files and logically represented Choose the partition key carefully for a table. The efficiency of cross-partition query may be lower. It is recommended to manage big data by physically splitting the table.
6. Separate hot and cold data as much as possible. Reduce the width of the table.
MySQL limits the storage of up to 4096 columns per table. And the size of each row of data cannot exceed 65535 bytes. Reduce disk IO to ensure the memory cache hit rate of hot data (the wider the table, the larger the memory occupied when the table is loaded into the memory buffer pool, and it will consume more IO) More effective use of the cache, to avoid reading useless cold data frequently used together in a table (to avoid more associated operations)
7, prohibit the establishment of reserved fields in the table
The naming of reserved fields is difficult to identify the name. The reserved fields cannot confirm the stored data type, so it is impossible to select the appropriate type to modify the reserved field type, and the table will be locked.
8. It is forbidden to store pictures in the database , File and other large binary data
are usually large files, which will cause rapid growth of data in a short period of time. When the database is read from the database, a large number of random IO operations are usually performed. When the file is large, IO operations are time-consuming and usually stored On the file server, the database only stores the file address information.
9. It is forbidden to do database stress testing online.
10. It is forbidden to connect directly to the environment database from the development environment and test environment

Three, database field design specifications
1. Preferentially select the smallest data type that meets the storage needs. The larger the field of the
reason
column, the larger the space required for indexing, so the number of index nodes that can be stored in a page The fewer and fewer the number of IOs required during traversal, the worse the performance of the index
. Method
1) Convert a character string to a numeric type storage, such as: convert an IP address into an integer data .

MySQL provides two methods to deal with IP addresses:

Before inserting data, use inet_aton to convert the ip address to an integer, which can save space. When displaying data, use inet_ntoa to convert the integer ip address to address display.

2) For non-negative data (such as self-incrementing ID, integer IP), unsigned integer is preferred for storage
because: unsigned can double the storage space compared to signed

The N in VARCHAR(N) represents the number of characters, not the number of bytes

Use UTF8 to store 255 Chinese characters Varchar(255)=765 bytes. Excessive length will consume more memory.
2. Avoid using TEXT and BLOB data types. The most common TEXT type can store 64k data
. It is recommended to separate BLOB or TEXT columns into a separate extended table.
Mysql memory temporary table does not Supports large data types such as TEXT and BLOB. If such data is included in the query, in-memory temporary tables cannot be used in operations such as sorting, and disk temporary tables must be used.
And for this kind of data, Mysql still has to perform a second query, which will make sql performance very poor, but it does not mean that such data types must not be used.
If you must use it, it is recommended to separate the BLOB or TEXT column into a separate extended table. Do not use select * when querying. You only need to retrieve the necessary columns. Do not query the column when you do not need the data in the TEXT column.
· TEXT or BLOB types can only use prefix indexes.
Because MySQL has restrictions on the length of index fields, TEXT types can only use prefix indexes, and there can be no default values ​​on TEXT columns.
3. Avoid using the ENUM type
· Modifying the ENUM value requires the use of the ALTER statement
· ENUM type ORDER BY operation is inefficient and requires additional operations
· It is forbidden to use numerical values ​​as the enumeration value of ENUM
4. Try to define all columns as NOT NULL
reasons:
· Index NULL column needs extra space to save, so it takes up more space;
· NULL value should be treated specially during comparison and calculation
5. Use TIMESTAMP (4 bytes) or DATETIME type (8 words) Section) storage time
TIMESTAMP stores the time range 1970-01-01 00:00:01 ~ 2038-01-19-03:14:07.
TIMESTAMP occupies 4 bytes and is the same as INT, but it is more readable than INT
. The DATETIME type is used for storage beyond the TIMESTAMP value range.
People often use strings to store date-type data (incorrect practice):
· Disadvantage 1: Cannot use date functions for calculation and comparison
· Disadvantage 2: Using string to store dates takes up more space
6. Related to finance The amount data must use the decimal type.
· Non-precision floating point: https://blog.csdn.net/shkfpwzfloat,double
· Precise floating point: The decimal
Decimal type is a precise floating point number, which will not lose precision during calculation. The occupied space is determined by the defined width. Every 4 bytes can store 9 digits, and the decimal point takes up one byte. Can be used to store integer data larger than bigint.

Fourth, index design specifications
1. Limit the number of indexes on each table. It is recommended that the index of a single table does not exceed 5
indexes, not as many as possible! Indexes can improve efficiency and can also reduce efficiency.
Indexes can increase query efficiency, but also reduce the efficiency of inserts and updates, and even in some cases, reduce query efficiency.
Because when the mysql optimizer chooses how to optimize the query, it will evaluate each index that can be used according to the unified information to generate the best execution plan. If there are many indexes at the same time, they can be used for the query. It will increase the time it takes for the mysql optimizer to generate an execution plan, and it will also reduce query performance.
2. It is forbidden to create a separate index for each column in the table.
Before version 5.6, one SQL can only use one index in a table. After 5.6, although there is an optimization method for merging indexes, it is still far from using one The query method of the joint index is good
3. Each Innodb table must have a primary key.
Innodb is an index-organized table: the logical order of data storage and the order of the index are the same.
Each table can have multiple indexes, but there can only be one storage order of the table. Innodb organizes the table in the order of the primary key index.
Do not use frequently updated columns as primary keys, and do not apply multi-column primary keys (equivalent to a joint index) Do not use UUID, MD5, HASH, or string columns as primary keys (the sequential growth of data cannot be guaranteed).
It is recommended to use an auto-increment ID value for the primary key.

Five, common index column suggestions
· Columns that appear in the WHERE clause of SELECT, UPDATE, and DELETE statements
· Fields included in ORDER BY, GROUP BY, and DISTINCT Do
not create an index for the columns that match the fields in 1 and 2 , It is usually better to establish a joint index for the fields in 1, 2
· The associated column of multi-table join

6. How to choose the order of index columns The
purpose of indexing is to search for data through the index, reduce random IO, and increase query performance. The less data the index can filter out, the less data will be read from the disk. .
· The highest discrimination is placed on the leftmost side of the joint index (discrimination = the number of different values ​​in the column/the total number
of rows in the column ); · Try to put the column with the smallest field length on the leftmost side of the joint index (because of the field length The smaller, the greater the amount of data that can be stored on a page, and the better the IO performance);
· The most frequently used columns are placed on the left side of the joint index (so that fewer indexes can be established).

Seven, avoid creating redundant indexes and duplicate indexes
because this will increase the time it takes for the query optimizer to generate an execution plan.
· Examples of duplicate indexes: primary key(id), index(id), unique index(id)
· Examples of redundant indexes: index(a,b,c), index(a,b), index(a)

Eight, give priority to covering index
For frequent queries, give priority to using covering index.
Covering index: It is an index covering index that contains all query fields (fields included in where, select, ordery by, group by)
:
· Avoid secondary queries of
Innodb table indexing Innodb is stored in the order of the clustered index For Innodb, the secondary index stored in the leaf node is the primary key information of the row.
If the data is queried by the secondary index, after finding the corresponding key value, the secondary query must be performed through the primary key. Get the data we really need. In a covering index, all data can be obtained from the key value of the secondary index, which avoids secondary queries on the primary key, reduces IO operations, and improves query efficiency.
· Random IO can be turned into sequential IO to speed up query efficiency.
Because the coverage index is stored in the order of key values, for IO-intensive range search, it is much less IO than reading each row of data from the disk randomly, so The covering index can also be used to convert the random read IO of the disk into the sequential IO of the index search during access.

Nine, index SET specifications
try to avoid the use of foreign key constraints
· Foreign key constraints (foreign key) are not recommended, but must establish an index on the associated key between the table and the table;
· Foreign keys can be used to ensure the referential integrity of data , But it is recommended to be implemented on the business side;
· Foreign keys will affect the write operations of the parent table and the child table, thereby reducing performance.

X. Database SQL development specifications
1. It is recommended to use prepared statements for database operations.
Prepared statements can reuse these plans, reduce the time required for SQL compilation, and can also solve the SQL injection problem caused by dynamic SQL. Only pass parameters. More efficient than passing SQL statements. The same statement can be parsed once and used multiple times to improve processing efficiency.
2. Avoid implicit conversion of data types. Implicit
conversion will cause index failure. Such as: select name, phone from customer where id = '111';
3. Make full use of the existing indexes on the table.
Avoid using double% query conditions.
For example, a like'%123%', (if there is no leading%, only the trailing%, the index on the column can be used)
· A SQL can only use one column in the composite index for range query
such as: , b, c columns of the joint index, in the query condition, there is a range query of column a, the index on columns b and c will not be used, when defining the joint index, if column a is to be used for range search If this is the case, put column a to the right of the joint index.
Use left join or not exists to optimize the not in operation
because not in usually uses index failure.
4. When designing the database, you should consider future expansion.
5. The program connects to different databases and uses different accounts, and cross-database query in digits.
Leave room for database migration and sub-database sub-tables
. Reduce business coupling
. Avoid permissions. Security risks caused by oversize
6. It is forbidden to use SELECT * You must use SELECT <field list>.
Reasons for query :
· Consume more CPU and IO and network bandwidth resources
· Cannot use covering index
· Can reduce the impact of table structure changes
7. It is forbidden to use INSERT statements without field lists
such as: insert into values ​​('a','b','c');
insert into t should be used (c1,c2,c3) values ​​('a','b','c');
8. Avoid the use of subqueries. You can optimize subqueries into join operations.
Usually subqueries are in the in clause and in the subquery Only when it is simple SQL (does not include union, group by, order by, and limit clauses), can the subquery be converted into an associated query for optimization.
Reasons for poor subquery performance:
· The result set of the subquery cannot use the index. Usually the result set of the subquery will be stored in a temporary table. There will be no index in the temporary table of memory or disk, so the query performance will be affected. A certain impact;
· Especially for subqueries that return a relatively large result set, the greater the impact on query performance;
· Since subqueries will generate a large number of temporary tables and no indexes, they will consume too much CPU and IO resources generate a lot of slow queries.
9. Avoid using JOIN to associate too many tables.
For Mysql, there is an associated cache. The size of the cache can be set by the join_buffer_size parameter.
In Mysql, if you join one table for the same SQL, one more association cache will be allocated. If there are more tables associated in a SQL, the larger the memory occupied.
If a large number of multi-table association operations are used in the program, and the join_buffer_size setting is unreasonable, it is easy to cause server memory overflow, which will affect the stability of server database performance.
At the same time, for association operations, temporary table operations will occur, affecting query efficiency. Mysql allows up to 61 tables to be associated, and it is recommended that no more than 5.
10. Reduce the number of interactions with the database. The
database is more suitable for processing batch operations and combining multiple identical operations together, which can improve processing efficiency.
11. When performing or judgments corresponding to the same column,
the value of in instead of or in should not exceed 500 in operations Indexes can be used more effectively, or in most cases, indexes are rarely used.
12. It is forbidden to use order by rand() for random sorting, which
will load all the eligible data in the table into the memory, and then sort all the data in the memory according to the randomly generated value, and may generate one for each row Random values, if the data set that meets the conditions is very large, it will consume a lot of CPU, IO and memory resources.
It is recommended to obtain a random value in the program and then obtain the data from the database.
13. The WHERE clause prohibits function conversion and calculation
of the column. The index cannot be used when the function conversion or calculation is performed on the column.
· Not recommended:

· Recommendation:

14. Use UNION ALL instead of UNION when it is obvious that there will be no duplicate values
. UNION will put all the data of the two result sets in a temporary table before performing deduplication operations
. UNION ALL will no longer deduplicate the result set. Operation
15. Split a complex large SQL into multiple small SQLs
· Big SQL: logically more complex and requires a lot of CPU for calculations
· MySQL: one SQL can only use one CPU for calculations
· SQL split can pass Parallel execution to improve processing efficiency

11. Code of Conduct for Database Operation
1. Batch write (UPDATE, DELETE, INSERT) operations with more than 1 million rows must be performed multiple times in batches
. Large batch operations may cause serious master-slave delays
. Batch operations may cause serious master-slave delays. Large-scale write operations generally require a certain amount of time to be executed. Only when the execution on the master library is completed, will it be executed on other slave libraries, so it will cause the master library and the slave The long-term delay of the library
· When the binlog log is in row format, a large number of logs will be generated.
Large batch write operations will generate a lot of logs, especially for row format binary data. Since the row format will record the modification of each row of data, The more data we modify at a time, the more logs will be generated, and the longer it will take for log transmission and recovery. This is also a cause of master-slave delay.
· Avoid large transaction operations to
modify data in large quantities. It must be done in one transaction. This will cause a large amount of data in the table to be locked, resulting in a large amount of blocking, which will have a very large impact on the performance of MySQL.
In particular, long-term blocking will fill up all the available connections to the database, which will make other applications in the production environment unable to connect to the database, so be sure to pay attention to batch write operations.
2. Use pt-online-schema-change to modify the table structure
for large tables. Avoid master-slave delays caused by large table modifications
. Avoid locking tables when modifying table fields. You
must be cautious when modifying large table data structures. Causing serious lock table operations, especially in the production environment, cannot be tolerated.
pt-online-schema-change it will first create a new table with the same structure as the original table, and modify the table structure on the new table, and then copy the data in the original table to the new table and in the original table Add some triggers.
Copy the newly added data in the original table to the new table. After all the data in the row is copied, the new table is named the original table and the original table is deleted.
Decompose the original DDL operation into multiple small batches.
3. It is forbidden to grant super permission to the account used by the program.
When the maximum number of connections is reached, a user with super permission is also running to connect. The super permission can only be reserved for the account used by the DBA to deal with the problem.
4. For the program to connect to the database account, follow the principle of least privilege. The
database account used by the program can only be used under one DB. Accounts that are not allowed to be used by cross-database programs are not allowed to have drop permissions in principle.

Guess you like

Origin blog.51cto.com/14308901/2551444