Complete MySQL specification, worth collecting

picture

I. Introduction

MySQL is a relational database management system developed by the Swedish MySQL AB company and is a product of Oracle. MySQL is one of the most popular relational database management systems. In terms of WEB applications, MySQL is one of the best RDBMS (Relational Database Management System) application software.

From MySQL 5.7 to 8.0, Oracle officially jumped the Major Version number. Subsequently, many major updates were made on MySQL 8.0, making great strides forward on the road to enterprise-level databases. The new Data Dictionary design supports Atomic DDL. , a new version upgrade strategy, enhanced security and account management, enhanced InnoDB functions, etc. MySQL has provided NoSQL since version 5.7, and has been greatly improved in MySQL 8.0. MySQL 8.0 is 2 times faster than MySQL 5.7.

picture

2. MySQL protocol

2.1 Notes

1) [Mandatory] All objects in the database must have comments, including: tables, fields, indexes, etc., and must be kept up to date;

2.2 Character set

1) [Mandatory] The utf8 character set is used by default, without the risk of garbled characters. Except for some fields that need to store special symbols, utf8mb4 can be used, such as article content fields, which support emoticons, etc.;

2) [Mandatory] The sorting rule uses utf8-general-ci by default;

2.3 Storage engine

1) [Mandatory] Use the INNODB storage engine by default;

 Note: The query performance of the MyISAM engine has not been as high as that of InnoDB since MYSQL 5.5. In addition, InnoDB’s query performance based on the primary key is very high, and it supports transactions, row-level locks, high concurrency performance, and is suitable for multi-core CPUs and large databases. Hardware resources such as memory and SSD have better support and higher utilization;

If you need to use other types of storage engines, please use it under the advice of the DBA;

2.4 Database characteristics

1) [Recommended] Reduce dependence on database functions. For example, if MySQL features are used in business, and this feature only exists in MySQL, it will cause trouble for future database migrations;

2.5 Balancing Paradigm and Redundancy

1) [Recommendation] It is not necessary to adhere to the paradigm theory. Moderate redundant design, short field length and frequently queried fields can be redundant to other tables to avoid table connection queries, which can greatly improve query efficiency;

3. Database objects

3.1 Table design

3.1.1 Number of single database tables

1) [Mandatory] It is recommended that the number of single database tables be controlled within 500;

3.1.2 Single table data volume

1) [Mandatory] It is recommended that the data volume of a single table be controlled within 10 million (reference value);

Note: The appropriate number of records in the table cannot be copied rigidly. It needs to be comprehensively evaluated based on the server's CPU, memory, and disk IO capabilities. For example, the total memory of the server is 168G, the total data file size of the database is 100G, and the innodb cache pool is set to 120G. This Even if there are 30 million large tables, they can all be loaded into memory, and there will be no disk IO pressure on performance. According to experience, hot data generally accounts for about 10% of the total data. If hot data can be cached in memory, there will be no disk IO pressure in terms of performance.

3.1.3 Number of fields in a single table

1) [Mandatory] It is recommended that the number of lists should be controlled within 30;

Note: The purpose of controlling the number of fields in a single form is to control the length of data rows to avoid row migration and row linking. What if row lengths were calculated to avoid row chaining or row migration? The data rows of MYSQL are stored in the data page. The size of the data page is 16KB (default 16KB). The file header, Page, Header, and File Trailer occupy 102 bytes. The location of the Page Directory record data row in the data page also needs to be consumed. For data page space, it is recommended to calculate the total consumed space as 1KB, which means that the data page can have 15KB of space left. If 15KB is divisible except for the row length, row chaining can be avoided. Using as few variable-length large fields as possible can effectively reduce row migration.

3.1.4 Separation of hot and cold data

1) [Recommended] Split large fields with low access frequency into data tables to avoid wasting IO resources and cache resources. Columns that are frequently used together should be placed in one table to allow appropriate redundancy and avoid more association operations;

3.1.5 Database and table sharding strategy

1) [Recommended] If you use HASH to scatter the table, use decimal as the suffix of the table name, and the subscript starts from 1. Considering the subsequent expansion, it is recommended to use the binary tree sharding strategy.

2) [Recommendation] If the table is scattered by date and time, the table name needs to conform to the format of YYYY[MM][DD][HH][mm][sss].

Note: The query efficiency of large tables is very low, and horizontal splitting needs to be considered. There are many ways to split based on business characteristics. Tables that comply with time increment can be divided according to time, can be split by HASH method of ID, or can be split by calculation rules of certain specific fields.

3.1.6 Summary table

1) [Recommendation] Multi-table related queries will be very slow. According to the actual situation, you can consider summarizing the calculation in the business and recording it in the summary table.

3.2 Field design

3.2.1 Basic specifications

1) [Mandatory] Column names and column types that store the same data must be consistent, otherwise it will cause implicit conversion, cause index failure, and reduce query efficiency;

2) [Mandatory] On the premise of meeting possible needs to the greatest extent, fields should be designed to be as short as possible to improve query efficiency and reduce resource consumption by indexes;

3) [Mandatory] The length of the data row should not exceed 8020 bytes. If it exceeds this length, inserting two rows of data into a physical page will cause row linking, causing storage fragmentation and reducing query efficiency;

4) [Mandatory] It is recommended that the number of columns in a single table be controlled within 30;

5) [Mandatory] Try to use integer fields instead of IP, enumeration types, character types, and floating point types;

6) [Mandatory] All fields require default values. If there are special circumstances, they will be discussed and decided separately;

3.2.2 Character field

1) [Mandatory] Select the CHAR type for fields whose length does not change much to reduce waste of resources.

2) [Mandatory] For other fields of uncertain length, use varchar related types uniformly.

3.2.3 Integer field

1) [Mandatory] Clarify the unsigned value and the integer type used.

2) [Mandatory] You can use integer fields as integer as possible to improve query and connection performance and reduce storage overhead and CPU computing overhead. Such as enum, ip, small currency, etc.

3.2.4 Enum field

1) [Mandatory] It is forbidden to use enum, and tinyint can be used instead;

Note: Because modifying ENUM requires the use of the ALTER statement and DDL operations, the ORDER BY operation of the ENUM type is inefficient and requires additional operations.

3.2.5 Default value

1) [Mandatory] All fields need default values, and null is not allowed to avoid bugs caused by being unable to use indexes or null values. If there are special circumstances, blank characters can be stored instead of null;

Note: Null fields are difficult to optimize for queries, the index requires additional space, and the composite index is invalid, which reduces the overall performance of database processing and can easily cause the application layer program to report a null pointer exception.

3.2.6 Binary data

1) [Mandatory] It is prohibited to store static resources such as images and binary files on the database. An appropriate file system should be used. The database only stores URLs. Binary multimedia data and oversized text data should not be placed in database fields;

3.2.7 Text/Blob field

1) [Mandatory] Generally avoid using text, blob and other types of fields, which will waste more disk and memory space. Unnecessary large-field queries will eliminate hot data, resulting in a sharp decrease in memory hit rate and affecting database performance.

2) [Mandatory] Consider using varchar instead. If you must use text/blob, keep it in a separate extended table. If you want to use an index, you can only use a prefix index.

3.2.8 Date and time fields

1) [Recommended] The timestamp type is relatively simple, which can improve query efficiency and reduce disk space and IO, but the range is 1970-2038. Considering the history and future of the enterprise, it is recommended to use the int type (10) to store date and time stamps;

3.2.9 Amount field

1) [Mandatory] It is prohibited to use float or double to define the amount field. It is recommended to use decimal type or bigint type;

2) [Mandatory] Use the decimal type for the amount field, and give it sufficient length and precision. In the case of strict performance requirements, use the bigint type, and the unit is cents (if it is other currencies, other units need to be defined).

3.2.10 Others
phone field

1) [Mandatory] Considering that area codes or country codes may involve symbols such as ±(), and need to support fuzzy queries, character types such as varchar, etc. should be used;

coordinate field

1) [Mandatory] Indicates coordinates (0,0), which should be expressed in two columns instead of putting "0,0" in 1 column.

Reserved fields

1) [Recommendation] It is difficult to name reserved fields clearly; reserved fields cannot confirm the type of stored data, so the appropriate type cannot be selected; reserved fields are a kind of "over-design", we should do The most important thing is "design on demand". After detailed and effective analysis, only the necessary fields are placed in the data table instead of leaving a large number of spare fields.

3.3 Index design

Indexes can improve query efficiency, but they will reduce update efficiency. Therefore, the more indexes, the better. The principle is that if you can, don’t add them. If you want to add them, you must add them.

3.3.1 Number of single table indexes

1) [Recommended] The number of indexes on a single table should not exceed 5.

3.3.2 Number of fields in a single index

1) [Mandatory] The number of fields in a single index does not exceed 5.

3.3.3 Field selection

1) [Mandatory] For frequently updated fields, it is necessary to evaluate the read-write ratio and the performance benefits after creating an index before deciding whether to create an index.

For example, a field is updated 20 times per second, but is queried 100 times per second, and the data rows are located directly through the field. If the field does not have an index, it will cause a full table scan. If the field is updated, the data needs to be located using this field. Rows will also cause a full table scan in the update. In this case, an index must be created. (The corresponding situation is that the data row can be located by the ID of the data row, and there is no need to use the updated field to locate the data row. This situation is not suitable for creating an index).

2) [Mandatory] For fields with little distinction such as "gender", establishing an index will only have limited improvement in query performance, and is not much different from a full table scan.

3) [Mandatory] For fields that have already established a unique index, there is no need to establish a joint index related to this field.

4) [Mandatory] Do not create indexes or joint indexes for fields that do not appear in the query conditions.

3.3.4 Joint index

1) [Mandatory] The order of each field in the joint index must be consistent with the order of the fields in the query statement, otherwise the index may not be applied.

2) [Mandatory] The one with the highest degree of differentiation is placed on the leftmost side of the joint index.

3) [Mandatory] The most frequently used columns are placed on the left side of the joint index.

4) [Mandatory] Try to place the column with the smallest field length on the leftmost side of the joint index.

3.3.5 Prefix index

1) [Recommended] Create a prefix index for long string fields.

When there are many characters in the column to be indexed, the index will be large and slow. At this time, you can only index the part of the string at the beginning of the column to save index space and reduce duplicate index values. This ensures fast and effective filtering of data while saving index maintenance. s expenses.

3.3.6 Index type

1) [Mandatory] A primary key or a unique index must be established to uniquely determine one or more fields of a record. If a record cannot be uniquely determined, a common index must be established in order to improve query efficiency.

3.4 Primary key design

1) [Recommended] Generally do not use joint primary keys.

2) [Mandatory] The primary key must be specified. It is recommended to use memory-type and numerical fields for primary construction to cope with high-concurrency business scenarios of big data. If you use an auto-increment column, it depends on the characteristics of the database itself to a certain extent, and the global uniqueness of the distributed environment must also be considered. UUID is a character type, which increases index disk space and CPU overhead, and does not have the auto-increment feature.

3.5 Other regulations

In the Internet business with big data and high concurrency, the idea of ​​architectural design is to liberate the database and let the application layer assume more responsibilities. It is generally prohibited to use objects related to the characteristics of the database itself, such as stored procedures, triggers, views, etc., to reduce business coupling and let the database do what it does best.

3.5.1 Trigger

1) [Recommendation] It is forbidden to use the trigger feature of the database. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

3.5.2 Stored procedures

1) [Recommendation] It is forbidden to use the stored procedure feature of the database. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

3.5.3 Function

1) [Recommended] It is forbidden to use the function features of the database. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

3.5.4 Foreign keys

1) [Mandatory] It is prohibited to use the foreign key feature of the database. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

Note: Foreign keys will cause coupling between tables. Update and delete operations will involve related tables, affecting the performance of SQL and even causing deadlocks. In big data high-concurrency business scenarios, it is easy to cause a significant decline in database performance.

3.5.5 Constrained design

1) [Mandatory] This specification prohibits the use of database constraint features. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

Note: The primary key itself will have unique constraints. Other constraints such as check, foreign keys, etc. are recommended to be implemented at the application layer.

3.5.6 Table partition

1) [Mandatory] This specification prohibits the use of the table partition feature of the database. Please seek corresponding solutions at the application layer. If there are special needs, we will separately study and decide.

Note: The partition table physically appears as multiple files and logically appears as one table. The actual performance is not very good, and the management and maintenance costs are high. It is recommended to use physical table partitioning to manage big data. Please refer to Database partitioning. Documentation related to table strategies.

4. Naming

4.1 Basic regulations

All tables, views, indexes, triggers, functions and stored procedures of the database should follow the following naming convention:

1) [Mandatory] Uniform lowercase format.

2) [Mandatory] Use English letters, numbers and underscores to name. It is prohibited to use other characters, such as horizontal lines, etc.

3) [Mandatory] No more than 32 characters, must be clearly identifiable and easy to identify.

4) [Mandatory] It is forbidden to use Pinyin for naming, and it is forbidden to mix Pinyin and English.

5) [Mandatory] It is forbidden to use keywords. You can add prefixes to distinguish keywords. See Appendix 1 "Keyword List"

6) [Recommended] Temporary library and temporary table names must be prefixed with tmp and suffixed with timestamp.

7) [Recommended] The backup database and backup table names must be prefixed with bak and suffixed with timestamp.

8) [Recommended] Column names that store the same data in different tables must be consistent.

4.2 Library naming

1) [Recommended] Reference format: <prefix>[_business type/product type/other types]_<library name>

Prefix: required, such as baidu.

Type: Not required, but all libraries need to be selected uniformly or not. Reference type: Product type/Business type/Other types.

Library name: It should be as consistent as possible with the name of the business module it serves.

Positive example: 

name

<prefix>_<library name>

<prefix>_<type>_<library name>

Blog library

baidu_blog

baidu_ssp_blog

College Library

baidu_edu

baidu_ssp _edu

home library

baidu_home

baidu_ssp _home

User Center Library

baidu_ucenter

baidu_ssp _ucenter

CMS library

baidu_cms

baidussp _cms

Download library

baidu_down

baidu_ssp _down

Log library

baidu_log

baidu_ssp _log

4.3 Table naming

4.3.1 General table

1) [Recommended] Reference format: <Library name/Library name abbreviation>_<Table name/Table name abbreviation>.

The table name should be as consistent as possible with the name of the business module it serves.

Table names should try to contain words or abbreviations corresponding to the data stored.

Tables of the same module should be prefixed with the module name (or abbreviation) as much as possible.

Positive example: 

name

<Library name abbreviation>_<Table name/Table name abbreviation>

Blog user table

blog_user

Blog post table

blog_blog

Blog post content table

blog_blog_content

Blog comment form

blog_comments

Blog user statistics table

blog_user_stat

4.3.2 Association table

1) [Recommended] Reference format: library name/library name abbreviation>_<Table name 1>_<Table name 2>_rel.

Positive example: 

name

Table name 1

Table name 2

<Library name>_<Table name 1>_<Table name 2>_rel

Class user association table

blog_class

blog_user

blog_class_user_ref

4.4 Field naming

1) [Recommended] Reference format: [prefix_]<field name>

Generally, no prefix is ​​used (if it conflicts with keywords, you can consider adding a prefix to differentiate).

Field names should also be kept as consistent as possible with the actual data.

Positive example: 

name

[prefix_]<field name>

User ID

user_id

username

user_name

Phone number

phone

creation time

create_time

state

status

4.5 Index naming

1) [Recommended] Ordinary index: idx_<table name/table name abbreviation>_<column name/column name abbreviation[_column name/column name abbreviation]>.

2) [Recommended] Unique index: uidx_<table name/table name abbreviation>_<column name/column name abbreviation[_column name/column name abbreviation]>.

Remark:

[idx]: indicates index, English index.

[uidx]: represents a unique index, English unique index.

The joint index name should try to include all index key field names or abbreviations, and the order of each field name in the index name should be consistent with the index order of the index keys in the index.

Positive example: 

Ordinary index

unique index

idx_users_username

uidx_users_uid_username:(user_id,username)

5. SQL statement

5.1 in/or

The efficiency of or is n level, and the efficiency of in is log(n) level.

1) [Mandatory] Try to avoid using or in clauses to connect conditions, otherwise the engine will give up using the index and perform a full table scan.

2) [Mandatory] It is recommended that the number of in should be controlled within 1000 to avoid using in in large collections.

5.2 select *

1) [Mandatory] It is forbidden to use SELECT *. The application layer should specify the required fields to avoid unnecessary consumption of CPU, hard disk IO and network bandwidth.

正例:SELECT `blog_id` FROM `blog`;

Counter example: SELECT * FROM `blog`;

5.3 union all

1) [Recommended] Use union all instead of union. Union has the overhead of deduplication. Try to implement deduplication by the application layer.

5.4 Fuzzy query

1) [Mandatory] The use of fully fuzzy queries is prohibited, and indexes cannot be used, resulting in full table scans.

2) [Mandatory] You can use right fuzzy query, such as like ‘xxx%’, and the index can be applied normally.

5.5 Reverse query

1) [Mandatory] It is forbidden to use reverse queries, such as NOT, !=, <>, !<, !>, NOT IN, NOT LIKE, etc., which will lead to a full table scan.

5.6 Implicit type conversion

1) [Mandatory] It is forbidden to use implicit conversion, which will cause index failure.

Description: Conversion implicit occurs when an operator is used with operands of different types and a type conversion occurs to make the operands compatible. For example, the user_id database field is designed as an int type, but if you write it as a string type in SQL, it will cause the index to fail.

5.7 join

1) [Mandatory] Large table connection fields and other filter condition fields do not have appropriate indexes, and large tables are prohibited from using JOIN queries.

Note: If a full table scan is performed on a large table join query, a temporary table will be generated, which consumes more memory and CPU and greatly affects database performance.

2) [Recommendation] It is forbidden to join table queries with 3 tables or more. When writing SQL queries, you need to use explain to analyze the SQL execution efficiency (indicators: number of scanned rows, whether indexes are used, if the efficiency of joining tables is better than that of single table query) , allowing 3 tables to be joined).

5.8 SQL expressions

1) [Recommendation] Avoid using mathematical operations, functions, etc. in the database, which can easily couple business logic and DB together, and can easily lead to index failure.

5.9 Interaction

1) [Mandatory] Reduce the number of interactions with the database, that is, it is forbidden to query the database in a loop.

5.10 Large quantities

1) [Recommendation] In the Insert statement, according to tests, it is most efficient to insert 1,000 items in batches at one time. When there are more than 1,000 items, they need to be split. If the same insertion is performed multiple times, it should be merged and batched.

Note: Large batch write operations will generate a large number of logs, and the time required for log transmission and recovery is too long, causing serious delays in data synchronization between the master and slave environments. When data inconsistency is caused by this delay, you can consider directly forcing the query to the master database.

5.11 Big Events

1) [Recommendation] Follow the principle of minimal transaction correlation.

2) [Recommendation] Keep the transaction as simple as possible and the transaction time as short as possible.

Note: Modifying data in large batches must be done in one transaction, which will cause a large amount of data in the table to be locked, resulting in a large amount of blocking. Blocking will have a great impact on the performance of the database.

5.12 Index field order

1) [Mandatory] For the fields in the query conditions, the most effective index field must be written first, and the order of the fields in the joint index must be paid attention to.

5.13 insert 

1) [Mandatory] It is forbidden to use INSERT INTO t_xxx VALUES(xxx), and the specified inserted column attributes must be displayed.

Positive example: INSERT INTO blog (‘blog_id’, ‘title’, ‘user_id’) VALUES (1, ‘title’, 1)

Counter example: INSERT INTO blog VALUES(1,'Title','1')

5.14 DDL operations

1) [Mandatory] statement in the application program prohibits all DDL operations.

Note: If there are special needs, consultation and consent are required before use.

5.15 Sorting

1) [Recommendation] When used, sorting will be performed by default. When you do not need to sort, you can use order by null.

5.16 Aggregation functions

1) [Mandatory] Use count(1) and count(*) instead of count(column_name).

Description: count(1)≈count(*)>count(primary key ID)>count(column)

Count(*) can actually be understood as equal to count(0). MySQL will convert parameter * into parameter 0 for processing, so the execution processes of count(*) and count(1) are basically the same, and there is no difference in performance. .

When count(1), count(*), and count (primary key field) are executed, if there is a secondary index in the table, the optimizer will select the secondary index for scanning.

Do not use fields) to count the number of records, because its efficiency is the worst, and a full table scan will be used to count. If you must count the number of records in the table where this field is not NULL, it is recommended to create a secondary index for this field.

The count() function will not return NULL, but the sum() function may return NULL.

6. Database domain name

1) [Mandatory] It is forbidden to use IP to connect to the database.

Positive example:

Domain name specifications for each environment (xxx business module)

name

development environment

dev.xxx.db

test environment

test.xxx.db

Production Environment

prod.xxx.db

……

……

Master-slave library domain name command specifications

Production environment main library

prod-master.xxx.db

Production environment slave library 01

prod-slave-01.xxx.db

Production environment slave library 02

prod-slave-02.xxx.db

……

……

Notice:

Production environment: Production is taken in English, and the abbreviation is prod.

Development environment: Development in English, abbreviated as dev.

Test environment: Test in English, abbreviated as test.

From the library: take Slave in English, abbreviated as slave.

Main database: Master in English, abbreviated as master.

7. User behavior

1) [Mandatory] It is forbidden to assign accounts with super permissions to applications. Super permissions can only be reserved for accounts used by DBAs to handle problems.

2) [Mandatory] It is forbidden to store plain text passwords in the database.

3) [Mandatory] Direct connection to the online database from the development environment and test environment is prohibited.

4) [Mandatory] Online database stress testing is prohibited.

5) [Mandatory] It is forbidden to use IP to connect to the database, and intranet domain names should be used.

6) [Mandatory] It is forbidden to create a test library in the production environment.

7) [Mandatory] Reasonably allocate the permissions of database accounts. For example, application accounts are not allowed to have drop permissions in principle.

8) [Recommendation] You must notify the DBA in advance before importing and exporting data, and ask the DBA to assist in observation.

9) [Recommendation] Promotional activities or new features must be notified to the DBA in advance for traffic assessment.

10) [Recommended] Do not update in batches or query the database during peak business periods.

11) [Recommendation] When performing DDL/DML operations, the DBA needs to review them and observe various indicators such as service load during the execution process.

12) [Recommended] For particularly important database tables, communicate with the DBA in advance to determine maintenance and backup priorities.

picture

Guess you like

Origin blog.csdn.net/weixin_40381772/article/details/133338239