Interview - MySQL Q&A Highlights

1. Database Architecture

1.1. Talk about the infrastructure diagram of MySQL

Tell the interviewer about the logical architecture of MySQL. If you have a whiteboard, you can draw the picture below. The picture comes from the Internet.

picture

Mysql logical architecture diagram is mainly divided into three layers:

(1) The first layer is responsible for connection processing, authorization and authentication, security, etc. 

(2) The second layer is responsible for compiling and optimizing SQL 

(3) The third layer is the storage engine.

1.2. How is an SQL query executed in MySQL?

  • Check the statement first 是否有权限. If there is no permission, an error message will be returned directly. If there is permission, the cache will be queried first (before MySQL 8.0).

  • If there is no cache, the analyzer will 词法分析extract key elements such as select in the sql statement, and then judge whether the sql statement has syntax errors, such as whether the keywords are correct and so on.

  • Finally, the optimizer determines the execution plan for permission verification. If there is no permission, it will directly return an error message. If it has permission 调用数据库引擎接口, it will return the execution result.

2. SQL optimization

2.1. How do you optimize SQL in your daily work?

This question can be answered from the following dimensions:

2.1.1, optimize the table structure

(1) Try to use numeric fields

If the fields containing only numerical information should not be designed as character type, this will reduce the performance of query and connection, and will increase the storage cost. This is because the engine compares each character of the string one by one when processing queries and joins, whereas only one comparison is required for numbers.

(2) Use varchar instead of char as much as possible

Variable-length fields have small storage space and can save storage space.

(3) When the index column has a large amount of duplicate data, the index can be deleted

For example, there is a column of gender, almost only male, female, unknown, such an index is invalid.

2.1.2. Optimizing the query

  • You should try to avoid using the != or <> operators in where clauses

  • You should try to avoid using or to join conditions in where clauses

  • Do not appear select * for any query

  • Avoid null values ​​for fields in where clauses

2.1.3, index optimization

  • Index the fields used as query conditions and order by

  • Avoid creating too many indexes and use composite indexes more

2.2. How to read the execution plan (explain) and how to understand the meaning of each field?

Adding the explain keyword before the select statement will return the execution plan information.

picture

(1) id column: it is the serial number of the select statement. MySQL divides the select query into simple query and complex query.

(2) select_type column: Indicates whether the corresponding row is a simple or complex query.

(3) table column: which table is being accessed by a row of explain.

(4) type column: one of the most important columns. Represents an association type or access type, which is how MySQL decides how to look up rows in a table. From best to worst: system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

(5) possible_keys column: Shows which indexes the query may use to find.

(6) key column: This column shows which index mysql actually uses to optimize access to the table.

(7) key_len column: shows the number of bytes used by mysql in the index. Through this value, you can calculate which columns in the index are used.

(8) ref column: This column shows the column or constant used in the table lookup value in the index of the key column record, the common ones are: const (constant), func, NULL, field name.

(9) rows column: This column is the number of rows that MySQL estimates to be read and detected. Note that this is not the number of rows in the result set.

(10) Extra column: Display extra information. For example, there are Using index, Using where, Using temporary, etc.

2.3. Have you ever cared about the time-consuming sql in the business system? Are statistics too slow for queries? How to optimize slow query?

When we usually write Sql, we must develop the habit of using explain analysis. Statistics of slow query, operation and maintenance will be regularly counted to us

Optimizing slow query ideas:

  • Analysis statement, whether unnecessary fields/data are loaded

  • Analyze the SQL execution sentence, whether it hits the index, etc.

  • If the SQL is complex, optimize the SQL structure

  • If the amount of data in the table is too large, consider dividing the table

3. Index

3.1, the difference between a clustered index and a non-clustered index

It can be answered in the following four dimensions:

(1) A table can only have one clustered index, and a table can have multiple non-clustered indexes.

(2) For a clustered index, the logical order of the key values ​​in the index determines the physical order of the corresponding rows in the table; for a non-clustered index, the logical order of the indexes in the index is different from the physical storage order of the disk rows.

(3) The index is described by the data structure of the binary tree. We can understand the clustered index in this way: the leaf node of the index is the data node. The leaf node of a non-clustered index is still an index node, but there is a pointer to the corresponding data block.

(4) Clustered index: physical storage is sorted by index; non-clustered index: physical storage is not sorted by index;

3.2. Why use B+ tree, why not use ordinary binary tree?

This question can be viewed from several dimensions, whether the query is fast enough, whether the efficiency is stable, how much data is stored, and the number of disk searches, why is it not an ordinary binary tree, why is it not a balanced binary tree, why is it not a B tree, but a B+ tree?

3.2.1. Why is it not an ordinary binary tree?

If the binary tree is specialized as a linked list, it is equivalent to a full table scan. Compared with the binary search tree, the balanced binary tree has more stable search efficiency and faster overall search speed.

3.2.2. Why not a balanced binary tree?

We know that the query efficiency is much faster for data in memory than on disk. If the tree data structure is used as an index, then we need to read a node from the disk every time we look up data, which is what we call a disk block, but a balanced binary tree can only store one key value and data per node, If it is a B-tree, more node data can be stored, and the height of the tree will be reduced, so the number of disk reads will be reduced, and the query efficiency will be faster.

3.2.3. Why is it not a B tree but a B+ tree?

No data is stored on the non-leaf nodes of the B+ tree, only the key value is stored, while the B tree node not only stores the key value, but also stores the data. The default size of a page in innodb is 16KB. If no data is stored, more key values ​​will be stored, the order of the corresponding tree (the child node tree of a node) will be larger, and the tree will be shorter and fatter. In this way, the number of IOs for disk IO to find data will be reduced again, and the efficiency of data query will be faster.

All data of the B+ tree index is stored in the leaf nodes, and the data is arranged in order, and the linked list is connected. Then B+ tree makes range search, sort search, group search and deduplication search extremely simple.

3.3. What is the difference between Hash index and B+ tree index? How did you choose when designing the index?

  • B+ trees can perform range queries, but Hash indexes cannot.

  • B+ tree supports the leftmost principle of joint index, Hash index does not support.

  • B+ tree supports order by sorting, but Hash index does not.

  • Hash indexes are more efficient than B+ trees for equality queries.

  • When B+ tree uses like to perform fuzzy query, the words behind the like (for example, starting with %) can play an optimization role, and Hash index cannot perform fuzzy query at all.

3.4. What is the leftmost prefix principle? What is the leftmost matching principle?

The leftmost prefix principle is the leftmost first. When creating a multi-column index, according to business requirements, the most frequently used column in the where clause is placed on the leftmost.

When we create a combined index, such as (a1, a2, a3), it is equivalent to creating three indexes (a1), (a1, a2) and (a1, a2, a3), which is the leftmost matching principle.

3.5. Which scenarios are indexes not suitable for?

  • Not suitable for indexing with small amount of data

  • Frequent updates are not suitable for indexing = Fields with low discrimination are not suitable for indexing (such as gender)

3.6. What are the advantages and disadvantages of indexes?

(1) Advantages:

  • A unique index can ensure the uniqueness of each row of data in a database table

  • Indexes can speed up data queries and reduce query time

(2) Disadvantages:

  • Creating and maintaining indexes takes time

  • The index needs to occupy physical space. In addition to the data space occupied by the data table, each index also occupies a certain amount of physical space.

  • When adding, deleting, and modifying the data in the table, the index should also be dynamically maintained.

4. Lock

4.1. Has MySQL encountered a deadlock problem and how did you solve it?

encountered. My general steps for troubleshooting deadlocks are sauce purple:

(1) View the deadlock log show engine innodb status;

(2) Find out the deadlock Sql

(3) Analyze the sql lock situation

(4) Simulate a deadlock incident

(5) Analyze deadlock logs

(6) Analysis of deadlock results

4.2. What are the optimistic locks and pessimistic locks of the database and their differences?

(1) Pessimistic lock:

The pessimistic lock is single-minded and insecure. Her heart only belongs to the current transaction, and she is always worried that its beloved data may be modified by other transactions, so after a transaction has (acquired) a pessimistic lock, any other Transactions cannot modify data, and can only wait for the lock to be released.

(2) Optimistic locking:

The "optimism" of optimistic locking is reflected in the belief that data changes will not be too frequent. Therefore, it allows multiple transactions to make changes to the data at the same time.

Implementation method: Optimistic locks are generally implemented using the version number mechanism or the CAS algorithm.

4.3. Are you familiar with MVCC and its underlying principles?

MVCC (Multiversion Concurrency Control), the multi-version concurrency control technology.

The implementation of MVCC in MySQL InnoDB is mainly to improve the concurrent performance of the database, and to handle read-write conflicts in a better way, so that even when there is a read-write conflict, it can achieve non-locking and non-blocking concurrent reading.

5. Affairs

5.1. Four characteristics of MySQL transactions and their implementation principles

  • Atomicity: The transaction is executed as a whole, and either all or none of the operations on the database contained within it are executed.

  • Consistency: It means that the data will not be destroyed before the transaction starts and after the transaction ends. If account A transfers 10 yuan to account B, the total amount of A and B remains the same regardless of whether it is successful or not.

  • Isolation: When multiple transactions access concurrently, the transactions are isolated from each other, that is, one transaction does not affect the running effect of other transactions. In short, it means that there is no river water between affairs.

  • Persistence: After the transaction is completed, the operational changes made by the transaction to the database will be persisted in the database.

5.2. What are the isolation levels of transactions? What is MySQL's default isolation level?

  • Read Uncommitted

  • Read Committed

  • Repeatable Read

  • Serializable

Mysql's default transaction isolation level is Repeatable Read

5.3. What are phantom reads, dirty reads, and non-repeatable reads?

Transactions A and B are executed alternately, and transaction A is disturbed by transaction B, because transaction A reads the uncommitted data of transaction B, which is a dirty read.

Within the scope of a transaction, two identical queries read the same record but return different data, which is non-repeatable read.

Transaction A queries a range of result sets, another concurrent transaction B inserts/deletes data into this range, and commits silently, and then transaction A queries the same range again, and the result sets obtained by the two reads are different , this is phantom reading.

6. Actual combat

6.1. What should I do if the CPU of MySQL database soars?

Investigation process:

(1) Use the top command to observe to determine whether it is caused by mysqld or other reasons.

(2) If it is caused by mysqld, show processlist, check the session status, and determine whether there is sql that consumes resources running.

(3) Find out the sql with high consumption and see whether the execution plan is accurate, whether the index is missing, and whether the amount of data is too large.

deal with:

(1) Kill these threads (and observe whether the CPU usage drops)

(2) Make corresponding adjustments (such as adding indexes, changing sql, and changing memory parameters)

(3) Re-run these SQL.

Other situations:

It is also possible that each SQL does not consume a lot of resources, but suddenly, a large number of sessions are connected to cause the CPU to soar. In this case, it is necessary to analyze why the number of connections increases with the application, and then make corresponding adjustments. For example, limit the number of connections, etc.

6.2. How do you solve the master-slave delay of MYSQL?

The master-slave replication is divided into five steps: (the picture comes from the network)

picture

  • Step 1: The update events (update, insert, delete) of the main library are written to binlog

  • Step 2: Initiate a connection from the library and connect to the main library.

  • Step 3: At this time, the main library creates a binlog dump thread and sends the contents of the binlog to the slave library.

  • Step 4: After starting the slave library, create an I/O thread, read the binlog content from the main library and write it to the relay log

  • Step 5: A SQL thread will also be created to read the content from the relay log, execute the read update event from the Exec_Master_Log_Pos position, and write the updated content to the slave's db

Causes of master-slave synchronization delay

A server opens N links to the client to connect, so there will be large concurrent update operations, but there is only one thread that reads the binlog from the server. When a certain SQL is executed on the slave server for a longer time Or because a certain SQL needs to lock the table, there will be a large backlog of SQL on the master server, which is not synchronized to the slave server. This leads to a master-slave inconsistency, that is, a master-slave delay.

Master-slave synchronization delay solution

  • The master server is responsible for the update operation and has higher security requirements than the slave server, so some setting parameters can be modified, such as sync_binlog=1, innodb_flush_log_at_trx_commit=1 and other settings.

  • Choose a better hardware device as the slave.

  • If a slave server is used as a backup instead of providing queries, its load will be reduced, and the efficiency of executing the SQL in the relay log will naturally be high.

  • The purpose of increasing the slave server is to spread the pressure of reading, thereby reducing the server load.

6.3. If you were asked to do the design of sub-libraries and sub-tables, what would you do?

Sub-library and sub-table scheme:

  • Horizontal sub-library: Based on the field, according to a certain strategy (hash, range, etc.), the data in one library is divided into multiple libraries.

  • Horizontal table splitting: Based on fields, according to certain strategies (hash, range, etc.), the data in one table is split into multiple tables.

  • Vertical sub-library: Based on the table, different tables are divided into different libraries according to the business ownership.

  • Vertical table division: Based on the field, the fields in the table are split into different tables (main table and extension table) according to the activity of the field.

Commonly used sub-database and sub-table middleware:

  • sharding-jdbc

  • Mycat

Problems that may be encountered in sub-library and sub-table

  • Transaction problem: need to use distributed transactions

  • The problem of cross-node Join: To solve this problem, it can be implemented in two queries

  • Cross-node count, order by, group by and aggregation function problems: merge the results on the application side after getting the results on each node.

  • Data migration, capacity planning, capacity expansion, etc.

  • ID problem: After the database is split, you can no longer rely on the database's own primary key generation mechanism. The easiest way to consider UUID

  • Sorting and paging problems across shards

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/121218576