MySQL interview: classic questions

1. Let’s talk about the three paradigms

  • "First Normal Form" : Fields in the database are "atomic" , cannot be subdivided, and have a single responsibility

  • "Second Normal Form" : "Built on the basis of the first normal form ." The second normal form requires that each instance or row in the database table must "can be uniquely distinguished . " To achieve differentiation, it is usually necessary to add a column to the table to store the unique identification of each instance. This unique attribute column is called the primary key

  • "Third Normal Form" : "Built on the basis of the first and second normal forms" , ensuring that each column is directly related to the primary key column, rather than indirectly related to the non-primary key information of other tables.

But in our daily development, "not all tables must meet the three major paradigms ." Sometimes a few redundant fields can be associated with fewer tables, and the improvement in query efficiency may be qualitative.

2.What is the difference between MyISAM and InnoDB?

    1. "InnoDB supports transactions, MyISAM does not . "

    1. "InnoDB supports foreign keys, but MyISAM does not . "

    1. "InnoDB is a clustered index" , using B+Tree as the index structure. The data file is tied to the index and must have a primary key. "MyISAM is a non-clustered index" and also uses B+Tree as the index structure. The index and data files are separated, and the index saves the pointer of the data file. Primary key indexes and secondary indexes are independent.

    1. "InnoDB does not save the specific number of rows in the table" . "MyISAM uses a variable to save the number of rows in the entire table . "

  • 5.Innodb has  a "redolog"  log file, but MyISAM does not

  • 6. "Innodb stores files in frm and ibd, while Myisam stores files in frm, MYD, and MYI."

    • Innodb: frm is the table definition file, ibd is the data file

    • Myisam: frm is the table definition file, myd is the data file, and myi is the index file

    1. "InnoDB supports table and row locks, while MyISAM supports table-level locks"

  • 8. "InnoDB must have a unique index (primary key)" . If not specified, InnoDB will generate a hidden column Row_id to serve as the default primary key. "MyISAM does not need it."

3. Why is it recommended to use auto-incrementing id as the primary key?

  • 1. The B+ tree of the ordinary index stores the value of the primary key index. If the value is larger, it will "result in a larger storage space for the ordinary index."

  • 2. Use the auto-incrementing ID as the primary key index. The newly inserted data only needs to be placed at the end of the page. It can be directly "inserted in order" without deliberate maintenance.

  • 3. Page splitting is easy to maintain. When the current page where data is inserted is almost full, page splitting will occur. If the primary key index is not an auto-incrementing id, then data may be inserted from the middle of the page, and the data on the page will change frequently. , "resulting in higher page split maintenance costs"

4. How is a query statement executed?

  • 1. "Establish a connection" with the client through the connector

  • 2. Check whether the sql has been queried before by querying "cache query"

    • If yes, the result will be returned directly

    • If not, proceed to step three

  • 3. Use the analyzer to "analyze the semantics of the SQL" to see if it is correct, including formats, tables, etc.

  • 4. "Optimize the statement" through the optimizer , such as selecting the index and joining the order of the join table

  • 5. "Verify permissions" to verify whether there is query permission for the table

    • If not, a no permission error will be returned.

    • If yes, proceed to step six

  • 6. Call the storage engine through the executor to execute the sql, and then return the "execution result"

5. How is an update statement executed when using Innodb?

Use the following statement as an example. The c field has no index and id is the primary key index.

update T set c=c+1 where id=2;
  • 1. The executor first looks for the engine to get the line id=2. id is the primary key, and the engine directly uses tree search to find this row

    • If the data page where the row with id=2 is located is already "in memory" , it will be "directly returned" to the executor.

    • "Not in memory" , you need to "read into memory" from disk first , and then "return"

  • 2. The executor gets the row data given by the engine, adds 1 to this value, for example, it used to be N, but now it is N+1, gets a new row of data, and then calls the engine interface "Write this new row of data "

  • 3. The engine updates the new row of data into the memory and "records the update operation into the redo log" . At this time, the redo log is in  the "prepare"  state. Then notify the executor that the execution is completed and the transaction can be submitted at any time

  • 4. The executor "generates the binlog of this operation" and  "writes the binlog to disk"

  • 5. The executor calls the "commit transaction" interface of the engine, and the engine changes the redo log just written to the commit state, "update completed"

6.Why does Innodb transaction require two-phase commit?

  • Write redolog first and then binlog. Suppose that after the redolog is written but the binlog has not been written, the MySQL process restarts abnormally. At this time, the statement is not recorded in the binlog. Then you will find that if you need to use this binlog to restore the temporary library, due to the  "binlog loss" of this statement , the temporary library will not be updated this time, and the value of c in the restored row is 0, which is the same as the original library. The values ​​are different.

  • Write binlog first and then redolog. If there is a crash after the binlog is written, since the redolog has not been written yet, the transaction will be invalid after the crash recovery, so the value of c in this line is 0. But the log "Change c from 0 to 1" has been recorded in the binlog. Therefore, when binlog is used to restore later, "one more transaction comes out" , and the value of c in the restored row is 1, which is different from the value in the original database.

It can be seen that "if "two-phase commit" is not used, the state of the database may be inconsistent with the state of the library restored using its log . "

7.What is an index?

I believe everyone looked up the dictionary when they were learning Chinese characters when they were children. Think about the steps you took to look up the dictionary. We searched the dictionary directory one by one through the first letters of the Chinese characters a~z, and finally found the page number of the word. Think about what would happen if there was no table of contents. The worst result is that you might turn to the last page of the dictionary to find the word you are looking for.

The index is "equivalent to the directory in our dictionary" and can greatly improve our query efficiency in the database.

8. What are the scenarios for index failure?

Here are just a few. Different versions of mysql have different scenarios.

  • 1. The leftmost prefix rule (the leading index cannot die, and the middle index cannot be broken)

  • 2. Do not perform any operations on the index (calculations, functions, automatic/manual type conversion), otherwise the index will become invalid and turn to a full table scan.

  • 3. You cannot continue to use the columns to the right of the range conditions (bettween, <, >, in, etc.) in the index, such as:

select a from user where c > 5 and b = 4;
  • 4. When using (!= or < >) to judge the index field, it will cause the index to fail and turn to a full table scan.

  • 5. When using is null / is not null judgment on the index field, it will cause the index to fail and turn to a full table scan.

  • 6. When the index field uses like and starts with a wildcard ('% string'), it will cause the index to fail and turn to a full table scan, which is also the leftmost prefix principle.

  • 7. The index field is a string, but if you do not add single quotes when querying, it will cause the index to fail and turn to a full table scan.

  • 8. When the index field uses or, it will cause the index to become invalid and turn to a full table scan.

9. Why use B+ tree instead of B- tree?

The B+ tree only stores data in leaf nodes. Non-leaf nodes do not store specific data, only keys. The query is more stable and increases the breadth. And a node is a memory page on the disk, and the memory page size is fixed. So compared to B Trees, B-trees "can store more index nodes" , have larger widths, shorter tree heights, smaller nodes, and fewer disk IO times to pull data once, and B+ trees only need to traverse leaf nodes to complete the entire process. Tree traversal. Moreover, range-based queries in the database are very frequent and more efficient.

10.What is WAl? What are its benefits?

WAL is Write-Ahead Logging, which actually means "all modifications are first written to the log and then written to the disk" , which is used to ensure the atomicity and durability of data operations.

benefit:

  • 1. "Reading and writing can be executed completely concurrently" and will not block each other.

  • 2. Write to the log first, and the disk writing changes from "random writing to sequential writing" , which reduces the delay on the client side. Moreover, since sequential writes are most likely to be within a disk block, the number of IOs generated in this way is also greatly reduced.

  • 3. Write logs. When the database crashes, "you can use the logs to recover disk data."

11.What is return statement?

Returning the table means to first scan the row where the data in the index tree is located through the database index, obtain the primary key id, and then retrieve the data in the primary key index number through the primary key id. That is, queries based on non-primary key indexes need to scan one more index tree.

12.What is index pushdown?

If there are judgment conditions for certain indexed columns, MySQL will pass this part of the judgment conditions to the storage engine, and then the storage engine will judge whether the index meets the conditions passed by the MySQL server. "Only when the index meets the conditions, the data will be Retrieve it and return it to the MySQL server  . "

13.What is a covering index?

Covering index means that the execution of a query statement can only be obtained from the index, without reading from the data table, which can reduce the number of table returns. for example:

select id from t where age = 1;

id is the primary key index, and age is the ordinary index. The age index tree stores progressive information and can be returned directly.

14.What is the leftmost prefix principle?

The leftmost prefix actually says that the fields that appear in the where condition, "If there are only some columns in the combined index, the triggering index order of these columns" is triggered from front to back in the order when the index is defined, and finally The column on the left cannot be triggered, and all subsequent column indexes cannot be triggered.

For example, "There is a combined index of (a, b, c)"

where a = 1 and b = 1

At this time, a and b will hit the combined index

where a = 1 and c = 1

At this time, a will hit the combined index, but c will not

where b = 1 and c = 1

The combined index will not be hit at this time

15. How to choose between ordinary index and unique index?

  • Inquire

    • When the ordinary index is the condition, the data will be scanned until the entire table is scanned.

    • When the unique index is the query condition, the data will be returned directly without continuing to scan the table.

  • renew

    • Ordinary indexes will directly update the operation to the change buffer and then end

    • The unique index needs to determine whether the data conflicts

Therefore, "unique indexes are more suitable for query scenarios, and ordinary indexes are more suitable for insertion scenarios."

16.What is a transaction? What are its characteristics?

A transaction refers to a series of operations in a program that must all be completed successfully. If one fails, all operations will fail.

characteristic

  • "1. Atomicity" : Either all executions are successful, or none are executed.

  • "2. Consistency" : The integrity of the data before and after the transaction must be consistent.

  • "3. Isolation" : Isolation means that when multiple transactions are triggered at the same time, they cannot be interfered by the operations of other transactions. Multiple concurrent transactions must be isolated from each other.

  • "4. Durability" : Changes after the transaction is completed are permanent.

17. What is the isolation level of transactions?

  • 1. "Read submission" : that is, being able to "read data that has been submitted"

  • 2. "Read uncommitted" : that is, being able to "read data that has not been committed"

  • 3. "Repeatable reading" : Repeatable reading means that within a transaction, the data read at the beginning is consistent with the "same batch of data read at any time" before the end of the transaction.

  • 4. "Serializable" : The highest transaction isolation level. No matter how many transactions there are, they are "executed one by one in sequence."

  • "Dirty reading"

    • Dirty reading refers to "reading uncommitted data from other transactions ." Uncommitted means that the data may be rolled back, which means that it may not be stored in the database in the end, that is, it does not exist. Data that is read and must eventually exist is called dirty reading.

  • "Non-repeatable reading"

    • Compared with repeatable read, non-repeatable read means that within the same transaction, "the same batch of data read at different times may be different . "

  • "Phantom reading"

    • Phantom reading is for data insertion (INSERT) operations. Assume that transaction A has changed the contents of some rows, but has not yet submitted it. At this time, transaction B has inserted the same record row as the record before transaction A changed it, and committed it before transaction A submitted it. At this time, If you query in transaction A, you will find that "it seems that the changes just now did not take effect on some data" , but in fact it was just inserted by transaction B. This is called phantom reading.

18.What does binlog do?

Binlog is an archive log, which belongs to the server layer log. It is a file in binary format and is used to "record the SQL statement information updated by the user on the database . "

main effect

  • master-slave replication

  • Data Recovery

19.What does undolog do?

Undolog is the log of the InnoDB storage engine, used to ensure the atomicity of data. "It saves a version of the data before the transaction occurred, which means that the data recorded is the data before modification, which can be used for rollback." At the same time, it can Provides reading under multi-version concurrency control (MVCC).

main effect

  • transaction rollback

  • Implement multi-version control (MVCC)

20.What does relaylog do?

relaylog is a relay log, "used during master-slave synchronization" . It is an intermediary temporary log file used to store the binlog log content synchronized from the master node.

After the binlog of the master node is transferred to the slave node, it is written into the relay log. The slave sql thread of the slave node reads the log from the relaylog and then applies it to the local slave node. The slave server I/O thread reads the binary log of the master server and records it to the local file of the slave server. Then the SQL thread reads the contents of the relay-log log and applies it to the slave server, thus making the data of the slave server and the master server Be consistent . ”

21.What does redolog do?

Redolog is  "a log unique to the InnoDB storage engine" . It is used to record changes in transaction operations. It records the value after data modification. It will be recorded regardless of whether the transaction is submitted or not.

Can do "data recovery and provide crash-safe capabilities"

When there are operations related to additions, deletions, and modifications, they will first be recorded in Innodb and the data in the cache page will be modified. "The data in the redolog will not be actually written to the disk until mysql is idle . "

22.How does redolog record logs?

InnoDB's redo log has a fixed size. For example, it can be configured as a set of 4 files, and the size of each file is 1GB. Then a total of 4GB operations can be recorded. "Write from the beginning, then go back to the beginning and write in a loop when you reach the end . "

Therefore, if the data is full but there is no time to actually flash the data to the disk, then the "memory jitter" phenomenon will occur. From a naked eye perspective, you will find that mysql will be down for a while, and the disk is being flashed at this time. .

23.What is the difference between redolog and binlog?

  • 1. "Redolog"  is   a unique log of  "Innodb" , while "binlog"  is  of the "server"  layer and is used by all storage engines.

  • 2. "redolog"  records the "specific values" , what modifications were made to a certain page, and "binlog"  records the "operation content"

  • 3. The size of "binlog"  reaches the upper limit or flush log  "will generate a new file" , while  "redolog"  has a fixed size and can only be recycled.

  • 4. "Binlog logs do not have crash-safe capabilities" and can only be used for archiving. The redo log has crash-safe capabilities.

24. Let’s talk about mvcc, what is its function?

MVCC: Multi-version concurrency control is a commonly used method for handling read and write conflicts in modern database (including MySQL, Oracle, PostgreSQL, etc.) engine implementations. The purpose is to "improve throughput performance in high-concurrency database scenarios . "

Under the MVCC protocol, each read operation will see a consistent snapshot. "This snapshot is based on the entire library" and can implement non-blocking reads, which is used to "support the implementation of read commit and repeatable read isolation levels." ” .

MVCC allows data to have multiple versions. This version can be a timestamp or a globally incremented transaction ID. At the same point in time, different transactions see different data. This modified data is "recorded in undolog " of.

25. What is the reason why a Sql statement query is always slow?

  • "1. No index is used"

    • For example, the index fails due to the function, or the index itself is not added.

  • "2. The amount of table data is too large"

    • Consider sub-database and sub-table

  • "3. The optimizer selected the wrong index"

    • "Consider using"  force index to force indexing

26. What is the reason why a Sql statement query is occasionally slow?

  • "1. The database is refreshing dirty pages"

    • For example,  "redolog is full" , "memory is not enough", if the memory is released, dirty pages also need to be flushed, mysql  "dirty pages are flushed in normal idle state"

  • "2. Didn't get the lock"

27.How does Mysql synchronize data between master and slave?

  • 1. The master main library writes the event type of this update to the binlog file of the main library.

  • 2. The master  "creates a log dump thread to notify the slave"  that the data needs to be updated.

  • 3. "Slave"  sends a request to the master node to "save the contents of the binlog file to the local relaylog"

  • 4. "Slave opens the sql thread" to read the content in the relaylog, "re-execute the content locally" to complete master-slave data synchronization

"Synchronization strategy" :

  • 1. "Full synchronous replication" : The master database forcibly synchronizes logs to the slave database, and waits for all slave databases to be executed before returning to the client, which results in poor performance.

  • 2. "Semi-synchronous replication" : The operation is considered successful when the master database receives at least one confirmation from the slave database. The slave database successfully writes the log and returns ack confirmation.

28. How to solve the master-slave delay?

  • 1.MySQL version 5.6 and later provides a "parallel replication" method to replay by converting SQL threads into multiple work threads.

  • 2. "Improve machine configuration" (King Dao)

  • 3. Choose appropriate sub-database and table sub-strategies at the beginning of the business to avoid extra copy pressure caused by too large a single form database.

  • 4. "Avoid long affairs"

  • 5. "Avoid making the database perform various large-scale operations"

  • 6. For some businesses that are very sensitive to delays, "directly use the main library to read"

29.The size of the table does not change after deleting the table data. Why is this?

When using delete to delete data, the corresponding data row is not actually deleted, but a "logical deletion" . InnoDB only "marks it as reusable" , so the table space will not become smaller.

30.Why is VarChar recommended not to exceed 255?

When the length of varchar is defined to be less than or equal to 255, the length identification bit requires one byte (utf-8 encoding)

When it is greater than 255, the length identification bit requires two bytes, and the created "index will also be invalid"

31.How to implement distributed transactions?

  • 1. "Local message table"

  • 2. "Message transaction"

  • 3. "Second stage submission"

  • 4. "Three-stage submission"

  • 5.「TCC」

  • 6. "Best Effort Notice"

  • 7. "Seata Framework"

32.What are the locks in Mysql?

The following is not exhaustive, just understand the meaning of locking

  • Lock-based attribute classification: shared lock, exclusive lock

  • Lock-based granularity classification: table lock, row lock, record lock, gap lock, temporary key lock

  • Lock-based status classification: intention shared lock, intention exclusive lock, deadlock

33. Why not use long transactions?

  • 1. In the case of concurrency, the database "connection pool is easily overwhelmed"

  • 2. "Easy to cause a lot of blocking and lock timeouts"

    • Long transactions also occupy lock resources and may bring down the entire library.

  • 3. The execution time is long and can easily cause "master-slave delay"

  • 4. "Rolling back takes a long time"

    • The longer the transaction, the more transactions there will be in the entire time period.

  • 5. "The undolog log is getting bigger and bigger"

    • Long transactions mean that there will be very old transaction views in the system. Since these transactions may access any data in the database at any time, all rollback records that may be used in the database must be retained before the transaction is committed, which will occupy a large amount of storage space.

34.What does the buffer pool do?

The buffer pool is a memory area. In order to "improve the performance of the database" , when the database operates data, the data on the hard disk is loaded into the buffer pool. It does not deal with the hard disk directly, but operates the data in the buffer pool. The increase of the database Deletion, modification and checking are all performed on the buffer pool.

The data content cached in the buffer pool is also a data page.

Among them, "there are three major doubly linked lists" :

  • "free linked list"

    • Used to help us find free cache pages

  • "flush linked list"

    • Used to find dirty cache pages, that is, cache pages that need to be flushed

  • "lru linked list"

    • It is used to eliminate cache pages that are not frequently accessed. It is divided into hot data area and cold data area. The cold data area mainly stores data that is not often used.

Pre-reading mechanism:

  • Buffer Pool has a special feature called pre-reading. When the storage engine interface is called by the server layer, it will predict the response and load the data and indexes that may be used next time into the Buffer Pool.

35. Tell me about your Sql tuning ideas

  • 1. "Table structure optimization"

    • 1.1 Split fields

    • 1.2 Selection of field type

    • 1.3 Limitations on field type size

    • 1.4 Reasonably add redundant fields

    • 1.5 New fields must have default values

  • 2. "Index aspect"

    • 2.1 Selection of index fields

    • 2.2 Make good use of the index pushdown, coverage index and other functions supported by mysql

    • 2.3 Choice of unique index and ordinary index

  • 3. "Query Statement"

    • 3.1 Avoid index failure

    • 3.2 Reasonable writing of where condition field order

    • 3.3 Small table drives large table

    • 3.4 You can use force index() to prevent the optimizer from selecting the wrong index.

  • 4. "Sub-library and sub-table"

Guess you like

Origin blog.csdn.net/piaomiao_/article/details/124449460