Foreword: This article is a summary of each knowledge point. As for the principle, you need to understand it yourself, but in the interview, you only need to say the most critical points. If you talk too much, the interviewer will not be patient.

1. The difference between MyISAM and InnoDB engines in Mysql

myisam does not support transactions, foreign keys, only non-clustered indexes in the index structure, and table locks in the lock structure.

innodb supports transactions, foreign keys, index structure has clustered index and non-clustered index, lock structure supports table lock and row lock, mysql default engine is innodb

2. What is an index

Index is a data structure that can improve database query data. The more common ones are hash index and b+ tree, and the default index used by innodb storage engine is b+ tree index

3. The type of index:

        Ordinary index: the most basic index, without any restrictions, the value can be repeated or null,

        Unique index: Similar to ordinary index, the difference is that the value of unique index cannot be repeated, but it can be null (multiple unique indexes can be created)

        Primary key index: It is a special unique index, the value cannot be repeated and cannot be empty, and only one primary key index can be created

        Composite index: multiple fields jointly create an index, but follow the "leftmost prefix principle"

4. Advantages and disadvantages of indexing:

Advantages: speed up the query efficiency of data, and at the same time ensure the uniqueness of data through unique indexes

Disadvantages: Although the speed of querying data is improved, the speed of updating data is reduced. This is because when updating data, it is necessary to update the index file additionally

5. Why not use hash index:

First: There may be a hash conflict problem.

Second: if it is an equivalent query, the efficiency of the hash index is higher than that of the b+ tree, because the location of the data can be located once using the hash algorithm, but if it is a range query, the hash algorithm will not work Yes, because the value obtained by using the hash algorithm is only the subscript of an array and not a range of values, so we can only scan the entire table to find out the qualified data, and the b+ tree can use binary search to perform range search, and It is sorting. The value size of the hash index is not exactly the same as the value size of the index column, so it cannot be sorted, and the b+ tree is already sorted according to the size when storing data, so the comprehensive consideration is to use the b+ tree It will be better as the default index of the innodb engine

6. Why is it recommended to use self-incrementing id/what is page splitting

Because the bottom layer of MySQL uses data pages as units to store data, the default size of a data page is 16k, that is to say, when this page is full, you need to apply for another page to write data. If you use auto-increment id, because When the data is inserted into the b+ tree, it is already sorted in order, so when the data on this page is full, you can directly apply for another page to write. If the primary key is not self-incrementing, then insert the b+ tree At the time, the data will be inserted into the appropriate position, which may cause some data to move, and this movement may cause page splitting, which greatly affects the efficiency

7. Advantages and disadvantages of auto-increment id

Advantages: The performance of auto-increment id is the best, and it can avoid page splitting, and the bytes of int type are relatively small, so that more primary keys can be stored in the non-leaf nodes of the b+ tree, then the whole b+ tree More data can be stored

Disadvantages: It cannot be used in distributed systems, because there is no way to merge tables with auto-increment id. If the business is too large, the auto-increment range will exceed the maximum value, and the auto-increment primary key is regular and easy to be detected

8. Why is the query performance of the primary key index higher than that of the ordinary index in InnoDB?

Because of the primary key index, the data is queried directly on the b+ tree of the primary key index according to the primary key

The ordinary index needs to first query the primary key on the b+ tree of the ordinary index, and then query the data on the b+ tree of the primary key index according to the primary key, so the primary key index is queried once, while the ordinary index is queried twice, and the extra one is called return table Inquire

9. How to avoid query back to the table

We use index coverage to avoid querying back to the table. Index coverage is the field to be queried and built into the joint index. For example, if we create an index for the name field, then we can query the required value when selecting name. This At this time, there is no need to return to the table to query, and if it is to select name age, because name and age have not established a joint index, at this time, a return to the table will be performed to find out the data of age

10. Leftmost prefix matching principle

When building a joint index, the leftmost prefix matching principle will be followed. If the joint index field is used to query data, the data will be matched from the leftmost field of the joint index.

This is because the bottom layer of the joint index is a b+ tree, so it is built according to the leftmost index field of the joint index when it is built, that is to say, on this b+ leaf node, the leftmost fields are in order , while other fields are unordered, so after creating the joint index, if the leftmost field of the joint index does not appear in the where condition, then the index will not take effect and a full scan will be performed

11. Clustered index and non-clustered index

Clustered index: Put the data and the index together, and the leaf nodes of the b+ tree save the row data

InnoDB will collect data through the primary key to build a clustered index. If there is no primary key, then use a unique non-null index instead. If there is no unique non-null index, InnoDB will implicitly define a primary key as a clustered index. , so innodb will definitely create a clustered index, and a table can only have one clustered index, which corresponds to a non-clustered index: the data and the index are stored separately, and the leaf nodes of the b+ tree store the primary key of the corresponding data Value, the indexes we add in our work are all auxiliary indexes. The auxiliary index is to find the secondary index of the primary key. After finding the primary key, we can use the primary key index to find the corresponding row data

12. Index invalidation

1. The fields before and after or must be indexed, and it will be invalid.
2. The percent sign of the fuzzy query should be placed at the end.
3. If the column type is a string, the data needs to be quoted in quotation marks in the query condition. Otherwise, do not use the index
4. Violates the leftmost prefix matching principle
5. Use not equal to, or continue mathematical operations on the index column
6. Mysql estimates that using full table scans is faster than using indexes, then do not use indexes

13. MySQL optimization

I think MySQL optimization can be optimized mainly from the SQL statement. Although MySQL can be optimized by splitting databases and tables, improving the network frame or improving server performance. However, our daily development is still based on SQL statements, so we give priority to optimizing SQL statements.
But when the SQL statement we write is relatively slow, we can use ex to view the execution process of the SQL statement, see if it is indexed, and then perform a targeted optimization. When writing SQL statements in my daily development, the select statement Do not use asterisks in , and specify the specific fields. Then, for fields with indexes, avoid index failures and follow the leftmost prefix matching principle. Try not to use subqueries for join table queries. When using outer joins , to use a small table to drive a large table, if the query efficiency of the linked table is too slow, you can first query the respective tables separately, and then perform data processing in the Java code

14. Transactions and four characteristics

A transaction is an independent unit of work, and the operations in this independent unit of work either succeed or fail

Four characteristics of transactions (ACID):

Atomicity, Isolation, Consistency, Persistence

Atomicity: It means that if these operations succeed, they all succeed, and if one fails, they all fail.

Consistency: The data before and after the execution of the transaction should be consistent. For example, the funds between account A and account B are 500. No matter how many times they transfer money to each other, after the transaction is over, the funds in their two accounts It still has to add up to 500.

Isolation: Isolation means that when multiple users need to operate a table concurrently, the database will open an independent transaction for each user, and multiple concurrent transactions must be isolated from each other.

Persistence: Once a transaction is committed, the changes to the data in the database are permanent.

15. Dirty read, phantom read, non-repeatable read

Dirty read: The current transaction has read the data that other transactions want to modify but failed to modify

Non-repeatable read: Transaction A reads the data, and when the data is processed, transaction B modifies the data. When transaction A reads the data again, it will find that the data has been read twice. The data does not match, this is non-repeatable read

Phantom reading: Transaction A searches data with certain conditions. When transaction A is processing data, transaction B inserts data with the same query conditions. When transaction A queries data with the same conditions again, it will put The data inserted by transaction B is queried together, which is phantom reading

16. Isolation level

Read as commit: transaction A can read the execution result of transaction B that has not been committed

Read committed: transaction A can only read the execution results submitted by transaction B

Repeatable read: This is the default isolation level of mysql, which ensures that different instances of the same transaction can read the same data concurrently

Serialization: the highest isolation level, this is to solve the problem of phantom reading through mandatory sorting transactions

17. Understanding of MVCC:

MVCC is a multi-version concurrency control and a method of managing concurrency control. It is mainly to improve the concurrency performance of the database. Under the isolation level of read committed and repeatable read, the select operation will access the data in the version chain. This allows other transactions to modify this piece of data, and after the modification is completed, it will be saved in the version chain, thus realizing concurrent execution of reading and writing, which can improve the performance of the database.

Read refers to snapshot read, not current read, and mvcc cannot be used under the isolation level of read uncommitted and serialized, because read uncommitted and each read is the latest value, and serialized reads and writes will increase Lock

18. The principle of MVCC:

Before talking about the principle of mvcc, let’s talk about what is the version chain. The version chain is mainly composed of two important parts, the rollback pointer and the undolog. The rollback pointer is a hidden field in the row data, pointing to the previous Version history, and undolog is all history of this data. There are many records in the version chain, as to which data to read is determined by readview. There are four important parameters in readview, namely: when generating ReadView, the list of transaction ids that have not been committed in the current database, the smallest value in the transaction list, and the id that the database should assign to the next transaction when generating ReadView Value, the transaction id value of the transaction that generated the ReadView. Then follow specific rules to decide which record to read.

Then the biggest difference between the two isolation levels of read submitted and repeatable read under mvcc is that the timing of generating reidview is different. Under the read submitted isolation level, each select operation will generate a new readview, which is Why do non-repeatable reads occur under the read-committed isolation level? Under the repeatable read isolation level, in the same transaction, no matter how many times you select, it will only use the readview generated by the first select, so This guarantees repeatable reads.

19. lock

Lock is a mechanism to ensure the consistency and integrity of data when the database is accessed concurrently

By particle: table lock, row lock

Row lock is to lock this row of data. There are shared locks and exclusive locks:

Shared lock: When a piece of data has a shared lock, other transactions can read or add a shared lock to this data. When other transactions want to modify this data, they must wait for all the shared locks of the data to be released before they can be modified. .

        If a shared lock is added to a certain data in a transaction, then this piece of data can be read in a transaction and other transactions, and a shared lock can also be added to this data in other transactions, but after adding a shared lock, this piece of data cannot be modified. If you want to modify it, you must wait until all the shared locks are released before you can modify it (when there is only one transaction a, first add a shared lock to this data in a, at this time, you can still modify this data in transaction a For modification, it will automatically become an exclusive lock when the modification operation is performed)

Exclusive lock: When a transaction adds an exclusive lock to a certain piece of data, other transactions cannot add any locks to this piece of data, but the default is not locked when executing the select statement, so other transactions can still lock this piece of data a query

Table lock:

        Table-level locks include shared locks and exclusive locks. The usage is similar to row-level locks, except that this is applied to tables. Table locks also have an intention lock. Intention locks include intention shared locks and intention exclusive locks. The intention lock is inndb itself. Maintained, the purpose is to quickly determine whether there is a row of data in the table is locked, so as to improve performance

        Intention shared lock: Before a transaction adds a shared lock to a row of records, it must first obtain the intention shared lock of the table

        Intentional exclusive lock: Before a transaction adds an exclusive lock to a row of records, it must first obtain the intentional exclusive lock of the table

Pessimistic locking and optimistic locking (this passage can also be applied to java)

        Pessimistic lock and optimistic lock are an abstract concept of lock. Pessimistic lock believes that the probability of data being modified by concurrent access is relatively high, so it needs to be locked before modification. It adopts a conservative strategy of "acquiring the lock before accessing", while optimistic locking believes that the data will not be modified under normal circumstances, so it does not lock when reading, but it will do a lock when updating. Judging, judging whether the data has been modified by others, updating the data if it has not been modified, and retrying if it has been modified. This judgment is usually based on the version number, but there will be an aba problem in this way

        aba: Because the optimistic lock will judge the version number, assuming that the version number is originally a, the value of a is changed to b by another thread and then changed back to a, then the optimistic lock has no way to know when judging the version number Whether this data has been modified, this is the problem of aba

        Gap lock: It solves the "phantom read" problem under the repeatable read isolation level. If all transactions are snapshot reads, then there will be no phantom read problem, but it will occur when snapshot reads are used together with current reads. Phantom reading problem, gap lock is to lock a range of values, this range is left open right open range

Summary of the most common interview questions for the latest MySQL in 2023

Foreword: This article is a summary of each knowledge point. As for the principle, you need to understand it yourself, but in the interview, you only need to say the most critical points. If you talk too much, the interviewer will not be patient.

1. The difference between MyISAM and InnoDB engines in Mysql

2. What is an index

3. The type of index:

4. Advantages and disadvantages of indexing:

5. Why not use hash index:

6. Why is it recommended to use self-incrementing id/what is page splitting

7. Advantages and disadvantages of auto-increment id

8. Why is the query performance of the primary key index higher than that of the ordinary index in InnoDB?

9. How to avoid query back to the table

10. Leftmost prefix matching principle

11. Clustered index and non-clustered index

12. Index invalidation

13. MySQL optimization

14. Transactions and four characteristics

15. Dirty read, phantom read, non-repeatable read

16. Isolation level

17. Understanding of MVCC:

18. The principle of MVCC:

19. lock

Guess you like