MySQL table creation specification and attention

Refer to the Alibaba P3C specification document  

https://github.com/alibaba/p3c/blob/master/%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4Java%E5%BC%80%E5%8F%91%E6%89%8B%E5%86%8C%EF%BC%88%E7%BA%AA%E5%BF%B5%E7%89%88%EF%BC%89.pdf

 

  1. table building specification

        1). In the case of using judgment, the name of is_xxx is used, and the data type is unsigned tinyint (1 means true and 0 means false).

        2). The indication of all tables must be lowercase letters + numbers (windows is not case-sensitive, Linux is case-sensitive), and no numbers appear between two underscores (error: table1_3_system).

        3). Write the table name in singular form, do not use MySQL reserved fields such as desc, range, match, delayed, etc. The index name specification is:

pk_xx means primary key (primary key index); uk_xx means unique key (unique index); idx_xx table ordinary index

        4). Use decimal to define common ones such as price, salary, remittance and so on.

        5). If the length (0-255) and the length are basically the same, use char. If the length (0-5000) does not pre-set the storage space, you can use varchar, but if the length is greater than 5000, use text to process it and create a new table. To store, bind with the primary key to avoid affecting the efficiency of other field indexes.

        6). The three necessary fields of a table, id, gmt_create, gmt_modified, the primary key of the id table should be unsigned, which can be int or bigint, the latter two are datetime types, the former indicates the creation time, the latter indicates passive modification time

        7). Naming specification, generally "business name_function name", the library name should be consistent with the application name.

        8). The table is to allow appropriate redundancy, so as to improve the query speed, when the data is large, the effect is obvious. But it should also be noted that it should be a field that changes infrequently, and should not be a varchar super long character, let alone text text.

        9). If the content of a single table exceeds 500W rows or the storage is larger than 2GB, sub-database and sub-table are recommended. If this value is not expected to be reached within three years, sub-database and sub-table can be ignored in the design.

        10). Appropriate character storage length not only saves storage space and index storage, but also improves retrieval speed.  

            

     2. Index

        1). Fields with unique characteristics in business, that is, a combination should also be built into a unique index. (Don't think that the unique index affects the speed of insert, in fact, it can be ignored. Compared with this, the query speed brought by it is very large. In addition, the data verification is done at the application layer, but as long as there is no unique index , according to Murphy's law, must produce dirty data.)

        2). Join is prohibited for more than three tables. If you need to use join, you need to pay attention to the performance of indexes and SQL, and the data types need to be consistent. When querying multiple tables, you need to ensure that the associated fields are indexed.

        3). To set an index for varchar, you need to specify its length. It is not necessary to index the full text. Generally, the discrimination degree of 20 length is more than 90%. You can use count(distinct left(column name, index length))/count(* ) to judge the distinction.

        4). Disable left blur and full blur for page search, use search engine if needed. The index file has the leftmost prefix matching feature of B-Tree, and the index cannot be used if the left is not determined.

        5). If there is an order by scenario, please pay attention to the ordering of the index. The last field of order by is part of the composite index. Please do not put it at the end of the composite index to avoid file_sort, which will affect the query performance. For example: where a = ? b = ? order by c; The index is: a_b_c. When there is a range search in the index, the ordering cannot be used. Such as where a > 10 order by b; the index is a_b.

        6). Use the covering index to query. With the result of explain, the extra column will appear: using index.

        7). Use delayed association or sub-query to optimize multi-page paging scenarios. That is, the query value is located first, and then the joint query is performed.

如: select a.* from table1 as a ,(select id from table2  where 条件 limit 1000,100)as b where a.id =b.id

        8). SQL performance optimization goal: at least the range level, the requirement is the ref level, and if possible, it should be consts.

            consts : there is only one matching data (primary key or unique index)

            ref: refers to the use of ordinary index query

            range: perform a range search on the index

       9). The established combined index, the query with the highest degree of discrimination is on the left.

        If: where a=? b=?, a column can basically distinguish unique values, then only need to establish an index on a.

Note: When there is a query with an equal sign and a non-equal sign, please prepend the equal sign when building an index. For example: where a =? and b =? Even if a has a higher degree of discrimination, you also need to put b in the front, that is, uk_b_a;

     10). Prevent implicit conversion due to different field types, resulting in index failure.

     11). Extreme

            It is better to have no shortage: think that a query needs an index.

            Better to lack than to waste: Think that indexing will consume controls, seriously slowing down the speed of updates and new additions

            Resist the unique index: It is believed that the uniqueness of the business must be checked by the application layer and solved by the method of "check first and then insert".

   3. SQL statement

        1). Do not use count(column name) or count(1) instead of count(*), count(*) is the standard syntax defined by sql for statistics. count(*) will count rows with NULL value, but count (column name) will not count.

        2).count(distinct col) Calculate the non-NULL and non-repeated rows in this column, count(distinct col1, col2) If the first column value is empty, then it will return 0 even if another column has a different value.

        3). One problem to be aware of is that count(*) returns 0 if no rows are found, while sum(col) returns NULL, so there will be NPE. You can use select if(ISNULL(sum(g)),0,sum(g)) from table.

         4). ISNULL () is to judge whether it is NULL, null is null compared with any value, not true or false.

         5). When count(*)=0, the operations that require paging should be ended.

         6). Do not use foreign keys and cascades. All foreign key concepts need to be resolved at the application layer.

            Take the student table and the grades table as an example, the student_id of the student table is the primary key, and the student_id of the grades table is the foreign key. When updating the student table, the grade table needs to be updated at the same time, so this is a cascading update. Foreign keys and cascading are single-machine low concurrency, not suitable for distribution, and affect data update and insert performance.

         7). Do not use stored procedures.

         8). When the data is repaired, it needs to be selected first and then deleted.

         9). In should not be used as much as possible. If it needs to be used, the number behind in should be controlled within 1000.

        

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325357719&siteId=291194637