Classic database problem

1. Why use auto-increment column as the primary key

1. If we define the primary key (PRIMARY KEY), then InnoDB will choose the primary key as the clustered index. If the primary key is not explicitly defined, InnoDB will choose the first unique index that does not contain NULL values ​​as the primary key index. If there is no Such a unique index, InnoDB will choose the built-in 6-byte long ROWID as the implicit clustered index (ROWID increases as the primary key is written as row records are written. This ROWID is not as referenceable as ORACLE's ROWID, and is implicit ).

2. The data record itself is stored in the leaf node of the main index (a B+Tree). This requires that the data records in the same leaf node (the size of a memory page or disk page) are stored in the order of the primary key, so whenever a new record is inserted, MySQL will insert it into the appropriate node according to its primary key And position, if the page reaches the load factor (InnoDB defaults to 15/16), then a new page (node) is opened

3. If the table uses an auto-incrementing primary key, then every time a new record is inserted, the record will be sequentially added to the subsequent position of the current index node. When a page is full, a new page will be opened automatically

4. If you use a non-incremental primary key (such as ID card number or student number, etc.), since the value of the primary key inserted each time is approximately random, each new record must be inserted into a certain position in the middle of the existing index page. At this time, MySQL has to move the data in order to insert the new record into the appropriate position, and even the target page may have been written back to the disk and cleared from the cache. At this time, it must be read back from the disk, which adds a lot of overhead. At the same time, frequent movement and paging operations caused a lot of fragmentation, and an insufficiently compact index structure was obtained. Later, the table had to be rebuilt through OPTIMIZE TABLE and optimized to fill the page.

Second, why use data indexing can improve efficiency

1. The storage of data indexes is orderly

2. In the case of order, querying a data through the index does not need to traverse the index records

3. In extreme cases, the query efficiency of the data index is the dichotomy query efficiency, which is close to log2(N)

Three, the difference between B+ tree index and hash index

B+ tree is a balanced multi-branch tree, the height difference from the root node to each leaf node is not more than 1, and the nodes of the same level are linked by pointers, which is ordered

The hash index is to use a certain hash algorithm to convert the key value into a new hash value. It does not need to be searched from the root node to the leaf node level by level like a B+ tree. It only needs one hash algorithm. Disorderly

Fourth, the advantages of hash index:

1. Equivalent query. Hash index has an absolute advantage (the premise is: there is not a large number of repeated key values, if a large number of repeated key values, the efficiency of the hash index is very low, because there is the so-called hash collision problem.)

5. Scenarios where hash index is not applicable:

1. Range query is not supported

2. Does not support index completion sorting

3. The leftmost prefix matching rule of the joint index is not supported

Generally, the B+ tree index structure is suitable for most scenarios, and it is more advantageous to use hash index in the following scenarios:

In the HEAP table, if the stored data has a low degree of repetition (that is to say, the cardinality is large), the column data is mainly equivalent query. When there is no range query and no sorting, it is especially suitable to use a hash index, such as This SQL:

 

The commonly used InnoDB engine defaults to the B+ tree index, which will monitor the usage of the index on the table in real time. If it is believed that the establishment of a hash index can improve the query efficiency, it will automatically store the "adaptive hash index buffer in memory" "Create a hash index (adaptive hash index is turned on by default in InnoDB). By observing the search mode, MySQL will use the prefix of the index key to create a hash index. If almost most of a table is in the buffer pool, then create a Hash index can speed up equivalent query.

Note: Under certain workloads, the performance improvement brought by the hash index search is far greater than the additional monitoring index search situation and the overhead of maintaining the hash table structure. But sometimes, in the case of high load, the read/write lock added in the adaptive hash index will also bring competition, such as high-concurrency join operations. The like operation and% wildcard operation are also not applicable to adaptive hash indexes, and adaptive hash indexes may need to be turned off.

Six, the difference between B tree and B+ tree

1. B tree, each node stores key and data, all nodes form this tree, and the leaf node pointer is nul, and the leaf node does not contain any key information.

2. B+ tree, all leaf nodes contain information about all keywords, and pointers to records containing these keywords, and the leaf nodes themselves are linked in the order of smaller and larger keywords. All non- The terminal node can be regarded as the index part, and the node only contains the largest (or smallest) keyword in the root node of its subtree. (And the non-terminal nodes of the B-tree also contain valid information that needs to be found)

7. Why is B+ more suitable for file indexing and database indexing of operating systems in practical applications than B-tree?

1. B+'s disk read and write costs are lower. B+'s internal node does not have a pointer to specific keyword information. Therefore, its internal nodes are smaller than the B-tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold. The more keywords that need to be searched are read into the memory at one time. Relatively speaking, the number of IO reads and writes is reduced.

2. The query efficiency of B+-tree is more stable because the non-terminal point is not the node that ultimately points to the file content, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

Eight, MySQL joint index

1. A joint index is an index on two or more columns. For joint index: Mysql uses the fields in the index from left to right. A query can use only a part of the index, but only the leftmost part. For example, the index is key index (a,b,c). It can support a, a,b, a,b,c 3 combinations for searching, but it does not support b,c for searching. When the leftmost field is a constant reference , The index is very effective.

2. With additional columns in the index, you can narrow the scope of your search, but using one index with two columns is different from using two separate indexes. The structure of the composite index is similar to that of the phone book. The names of people are composed of surnames and first names. The phone book is first sorted by surname pair, and then the people with the same last name are sorted by first name. If you know the last name, the phone book is very useful; if you know the first and last name, the phone book is more useful, but if you only know the first name but not the last name, the phone book will be useless.

Nine, under what circumstances should not build or build less index

1. Too few table records

2. Tables that are frequently inserted, deleted, and modified

3. Table fields with repeated data and even distribution. If a table has 100,000 rows of records, a field A has only two values ​​of T and F, and the distribution probability of each value is about 50%, then for this table A Field indexing generally does not increase the query speed of the database.

4. Table fields that are often queried together with the main field but have more index values ​​in the main field

10. What is a table partition?

Table partitioning refers to decomposing a table in the database into multiple smaller, easy-to-manage parts according to certain rules. Logically, there is only one table, but the bottom layer is composed of multiple physical partitions.

11. The difference between table partitioning and sub-table

Sub-table: refers to the decomposition of a table into multiple different tables through certain rules. For example, the user order records are divided into multiple tables according to time.

The difference between partitioning and partitioning is that partitioning logically has only one table, while partitioning is to decompose a table into multiple tables.

12. What are the benefits of table partitioning?

1. The data of the partition table can be distributed on different physical devices, thereby efficiently using multiple hardware devices. 2. Compared with a single disk or file system, it can store more data

2. Optimize queries. When the where statement contains partition conditions, you can only scan one or more partition tables to improve query efficiency; when it involves sum and count statements, you can also process them in parallel on multiple partitions, and finally summarize the results.

3. The partition table is easier to maintain. For example: If you want to delete a large amount of data in batches, you can clear the entire partition.

4. Partition tables can be used to avoid some special bottlenecks, such as the mutually exclusive access of a single index of InnoDB, ext3 asks for the inode lock competition of your system, etc.

Thirteen, the limiting factors of the partition table

1. A table can only have 1024 partitions at most

2. In MySQL 5.1, the partition expression must be an integer, or an expression that returns an integer. Support for non-integer expression partitioning is provided in MySQL 5.5.

3. If there are primary key or unique index columns in the partition field, then many primary key columns and unique index columns must be included. That is: the partition field either does not contain the primary key or index column, or contains all the primary key and index column.

4. Foreign key constraints cannot be used in partitioned tables

5. MySQL partitioning is applicable to all data and indexes of a table. It cannot only partition the table data but not the index, nor can it only partition the index but not the table, nor can it partition only part of the data of the table.

14. How to judge whether MySQL currently supports partitioning?

Command: show variables like'%partition%' Operation result:

 

The value of have_partintioning is YES, indicating that partitioning is supported.

15. What are the partition types supported by MySQL?

1. RANGE partition: This mode allows data to be divided into different ranges. For example, a table can be divided into several partitions by year

2. LIST partition: This mode allows the system to partition the data by the value of a predefined list. According to the value partition in List, the difference from RANGE is that the range value of the range partition is continuous.

3. HASH partition: This mode allows to calculate the hash key of one or more columns of the table, and finally partition the data area corresponding to different values ​​of this hash code. For example, you can create a table that partitions the primary key of the table.

4. KEY partition: An extension of the above Hash mode, where the Hash Key is generated by the MySQL system.

Sixteen, four isolation levels

1. Serializable: It can avoid dirty reads, non-repeatable reads, and phantom reads.

2. Repeatable read (repeatable read): to avoid dirty reads and non-repeatable reads.

3. Read committed (read committed): to avoid the occurrence of dirty reads.

4. Read uncommitted: the lowest level, there is no guarantee under any circumstances.

17. About MVVC

The MySQL InnoDB storage engine implements a multi-version concurrency control protocol-MVCC (Multi-Version Concurrency Control) (Note: As opposed to MVCC, it is Lock-Based Concurrency Control). The biggest advantage of MVCC: read without lock, read and write without conflict. In OLTP applications with more reads and less writes, it is very important that reads and writes do not conflict, which greatly increases the concurrent performance of the system. At this stage, almost all RDBMS support MVCC.

1. LBCC: Lock-Based Concurrency Control, lock-based concurrency control.

2. MVCC: Multi-Version Concurrency Control, based on a multi-version concurrency control protocol. The purely lock-based concurrency mechanism has low concurrency. MVCC is an improvement on lock-based concurrency control, mainly to increase the concurrency in read operations.

18. In MVCC concurrency control, read operations can be divided into two categories:

1. Snapshot read (snapshot read): Read the visible version of the record (possibly historical version), without locking (shared read lock s is also locked, so it will not block other transactions from writing).

2. Current read: The latest version of the record is read, and the record returned by the current read will be locked to ensure that other transactions will not modify the record concurrently.

19. Advantages of row-level locking:

1. There are only a few locking conflicts when accessing different rows in many threads.

2. Only a small amount of changes when rolling back

3. You can lock a single row for a long time.

Twenty, the disadvantages of row-level locking:

1. It takes up more memory than page-level or table-level locking.

2. When used in most of the table, it is slower than page-level or table-level locking, because you have to acquire more locks.

3. If you frequently perform GROUP BY operations on most data or must scan the entire table frequently, it is significantly slower than other locks.

4. With high-level locking, by supporting different types of locking, you can also easily adjust the application, because the locking cost is less than row-level locking.

Twenty-one, MySQL optimization

1. Turn on the query cache and optimize the query

2. Explain your select query, which can help you analyze the performance bottleneck of your query statement or table structure. EXPLAIN query results will also tell you how your index primary key is used, how your data table is searched and sorted

3. When limit 1 is used when there is only one row of data, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to find the next data that matches the record.

4. Build an index for the search field

5. Use ENUM instead of VARCHAR. If you have a field such as "gender", "country", "ethnic", "status" or "department", you know that the values ​​of these fields are limited and fixed, then, You should use ENUM instead of VARCHAR.

6, Prepared Statements Prepared Statements are very similar to stored procedures, is a collection of SQL statements running in the background, we can get many benefits from using prepared statements, whether it is performance issues or security issues. Prepared Statements can check some of your bound variables, which can protect your program from "SQL injection" attacks

7. Vertical sub-table

8. Choose the right storage engine

Twenty-two, the difference between key and index

1. The key is the physical structure of the database. It contains two layers of meaning and functions, one is constraint (emphasis on constraints and standardizing the structural integrity of the database), and the other is index (for auxiliary queries). Including primary key, unique key, foreign key, etc.

2. Index is the physical structure of the database. It is only an auxiliary query. When it is created, it will be stored in a directory-like structure in another tablespace (innodb tablespace in mysql). If the index is to be classified, it is divided into prefix index, full-text index, etc.;

23. What are the differences between MyISAM and InnoDB in Mysql?

the difference:

1. InnoDB supports transactions, but MyISAM does not. For InnoDB, each SQL language is encapsulated as a transaction by default and automatically submitted, which will affect the speed, so it is best to put multiple SQL languages ​​between begin and commit to form a transaction;

2. InnoDB supports foreign keys, but MyISAM does not. Converting an InnoDB table containing foreign keys to MYISAM will fail;

3. InnoDB is a clustered index. The data file is tied to the index, and it must have a primary key. The efficiency of the primary key index is very high. However, the secondary index requires two queries, first to query the primary key, and then to query the data through the primary key. Therefore, the primary key should not be too large, because the primary key is too large, and other indexes will also be large. MyISAM is a non-clustered index, the data file is separated, and the index saves a pointer to the data file. The primary key index and secondary index are independent.

4. InnoDB does not save the specific number of rows in the table, and a full table scan is required when executing select count(*) from table. MyISAM saves the number of rows of the entire table with a variable, and only needs to read the variable when executing the above statement, which is very fast;

5. Innodb does not support full-text indexing, while MyISAM supports full-text indexing. MyISAM has higher query efficiency;

how to choose:

1. Whether you want to support transactions, if you want to, please choose innodb, if you don't need MyISAM;

2. If most of the tables are only read queries, you can consider MyISAM. If both read and write are frequent, please use InnoDB.

3. After the system crashes, it is more difficult for MyISAM to recover, can it be accepted?

4. Since MySQL5.5, Innodb has become the default engine of Mysql (previously MyISAM), indicating that its advantages are obvious to all. If you don't know what to use, use InnoDB, at least not bad.

Twenty-four, database table creation notes

1. The rationality of field names and field preparation

  • Eliminate fields that are not closely related;

  • Field naming must have rules and corresponding meanings (not part of English, part of pinyin, and fields with unclear meanings like abc);

  • Try not to use abbreviations in field naming (most abbreviations cannot make clear the meaning of the field);

  • Do not mix upper and lower case in the field (if you want to be readable, multiple English words can be connected by underscore);

  • Do not use reserved words or keywords in field names;

  • Maintain the consistency of field names and types;

  • Choose the number type carefully;

  • Leave enough margin for the text field;

2. System special field processing and suggestions after completion

  • Add delete mark (such as operator, delete time);

  • Establish a version mechanism;

3. Reasonable configuration of table structure

  • The processing of multi-type fields is whether there are fields in the table that can be broken down into smaller independent parts (for example: people can be divided into men and women);

  • For the processing of multi-valued fields, the table can be divided into three tables, which makes the retrieval and sorting more conditioned and ensures the integrity of the data!

4. Other suggestions

  • For big data fields, separate tables are stored to affect performance (for example: introduction fields);

  • Use varchar type instead of char, because varchar will dynamically allocate the length, and the specified length of char is fixed;

  • Create a primary key for the table, for tables without a primary key, there is a certain impact on the query and index definition;

  • To avoid table fields running as null, it is recommended to set the default value (for example: set the default value of int type to 0) in the index query, the efficiency is immediately obvious;

  • Establish indexes, it is best to build on unique and non-empty fields. Building too many indexes will have a certain impact on later insertions and updates (considering the actual situation to create);

Guess you like

Origin blog.csdn.net/suifeng629/article/details/106997472