mysql database frequently asked interview questions

1. What is the difference between NOW() and CURRENT_DATE()?
The NOW() command is used to display the current year, month, date, hour, minute and second.
CURRENT_DATE() displays only the current year, month and day.

2. What is the difference between CHAR and VARCHAR?
(1) The CHAR and VARCHAR types are different in storage and retrieval
(2) The CHAR column length is fixed to the length declared when creating the table, and the length value range is 1 to 255
(3) When CHAR values ​​​​are stored, they are used Spaces are padded to a specific length, and trailing spaces need to be removed when retrieving CHAR values.

3. The difference between the primary key index and the unique index
(1) The primary key is a constraint, and the unique index is an index, and the two are essentially different.
(2) After the primary key is created, it must contain a unique index, and the unique index is not necessarily the primary key.
(3) Unique index columns allow null values, while primary key columns do not allow null values.
(4) When the primary key column is created, it has defaulted to a non-empty unique index.
(5) A table can only create at most one primary key, but multiple unique indexes can be created.
(6) The primary key is more suitable for unique identifiers that are not easy to change, such as auto-increment columns, ID numbers, etc.
(7) The primary key can be referenced as a foreign key by other tables, but the unique index cannot.

4. What are the different storage engines in MySQL?
There are 5 types of storage engines:
MyISAM, Heap, Merge, NNODB, ISAM

5. What is the life cycle of SQL?
(1) The application server establishes a connection with the database server
(2) The database process gets the request sql
(3) Parses and generates an execution plan, executes
(4) Reads data into memory and performs logical processing
(5) Connects through step 1 , send the result to the client
(6) close the connection and release resources

6. How do you see all the indexes defined for the table?
The index is queried by
show index from ;

7. Why does the database use B+ tree instead of B tree?
(1) B tree is only suitable for random retrieval, while B+ tree supports both random retrieval and sequential retrieval.
(2) B+ tree has higher space utilization rate and can reduce the number of I/O. Disk reads and writes are cheaper. Generally speaking, the index itself is also very large, and it is impossible to store all of it in memory, so the index is often stored on the disk in the form of an index file. In this case, disk I/O consumption will be generated during the index lookup process. The internal node of the B+ tree does not point to the specific information of the keyword, but is used as an index. The internal node is smaller than the B-tree, and the number of keywords in the node that the disk block can accommodate is more, and it is read into the memory at one time. There are more keywords that can be searched, and correspondingly, the number of IO reads and writes is reduced. The number of IO reads and writes is the biggest factor affecting the efficiency of index retrieval.
(3) The query efficiency of B+ tree is more stable. The B-tree search may end at a non-leaf node. The closer the record is to the root node, the shorter the search time. As long as the keyword is found, the existence of the record can be determined. Its performance is equivalent to a binary search in the complete set of keywords. In the B+ tree, sequential retrieval is more obvious. In random retrieval, any keyword search must take a path from the root node to the leaf node. The search path length of all keywords is the same, resulting in the query efficiency of each keyword. quite.
(4) B-tree improves the disk IO performance and does not solve the problem of low efficiency of element traversal. The leaf nodes of the B+ tree are connected in sequence using pointers, and the entire tree can be traversed as long as the leaf nodes are traversed. Moreover, range-based queries in the database are very frequent, and B-trees do not support such operations.
(5) When adding and deleting files (nodes), the efficiency is higher. Because the leaf nodes of the B+ tree contain all keywords and are stored in an ordered linked list structure, this can improve the efficiency of addition and deletion.

8. What are the three major paradigms of the database?
The first paradigm:
each column cannot be split.
Second normal form:
On the basis of the first normal form, non-primary key columns are completely dependent on the primary key, but cannot be part of the primary key.
The third normal form:
On the basis of the second normal form, non-primary key columns only depend on the primary key and do not depend on other non-primary keys.
When designing the database structure, try to comply with the three paradigms. If you do not comply, there must be sufficient reasons. Such as performance. In fact we often compromise database design for performance.

9. How to optimize the SQL query statement?
(1) To optimize the query, you should try to avoid full table scanning. First, you should consider building indexes on the columns involved in where and order by.
(2) Avoid using * in the SELECT clause, try to capitalize SQL as much as possible
(3) Try to avoid judging the field is null value in the where clause, otherwise it will cause the engine to give up using the index and scan the whole table, use IS NOT NULL
(4) Using or in the where clause to connect the conditions will also cause the engine to give up using the index and perform a full table scan
(5) in and not in should also be used with caution, otherwise it will cause a full table scan

10. Covering index, return table, etc., have you understood it?
Covering index:
The query column must be covered by the built index without reading from the data table. In other words, the query column must be covered by the index used.
Back to the table:
The secondary index cannot directly query the data of all columns, so after querying the clustered index through the secondary index, and then querying the desired data, this process of querying through the secondary index is called back to the table.

11. What should I do if the CPU of the MySQL database soars?
Troubleshooting process:
(1) Use the top command to observe to determine whether it is caused by MySQLd or other reasons.
(2) If it is caused by MySQLd, show processlist, check the session status, and determine whether there is resource-consuming SQL running.
(3) Find out the sql with high consumption, and check whether the execution plan is accurate, whether the index is missing, and whether the data volume is too large.
Treatment:
(1) kill these threads (and observe whether the cpu usage rate drops),
(2) make corresponding adjustments (such as adding indexes, changing sql, and changing memory parameters)
(3) run these SQLs again.
Other situations:
It is also possible that each SQL does not consume a lot of resources, but suddenly, a large number of sessions come in and cause the CPU to soar. In this case, it is necessary to analyze why the number of connections will surge with the application, and then make a corresponding response Adjustments, such as limiting the number of connections, etc.

12. What are the methods for optimizing SQL statements? (Choose a few)
(1) In the Where clause: the connection between where tables must be written before other Where conditions, and those conditions that can filter out the maximum number of records must be written at the end of the Where clause. HAVING is the last.
(2) Replace IN with EXISTS and NOT IN with NOT EXISTS.
(3) Avoid using calculations on index columns
(4) Avoid using IS NULL and IS NOT NULL on index columns
(5) To optimize queries, try to avoid full table scans. First, consider where and order by. Create an index on the column.
(6) Try to avoid judging the null value of the field in the where clause, otherwise the engine will give up using the index and perform a full table scan
(7) Try to avoid performing expression operations on the field in the where clause, which will Causes the engine to give up using the index and perform a full table scan

13. Innodb's transaction and log implementation methods
Innodb has two types of logs, redo and undo.
Log storage form:
redo: When a page is modified, it is first written to the redo log buffer, then written to the file system cache of the redo log (fwrite), and then synchronized to the disk file (fsync).
Undo: Before MySQL5.5, undo can only be stored in the ibdata file. After 5.6, the undo log can be stored outside the ibdata by setting the innodb_undo_tablespaces parameter.
How transactions are implemented through logs:
(1) When a transaction modifies a page, undo must be recorded first, and the redo of undo must be recorded before undo is recorded, and then the data page is modified, and then the redo of the data page modification is recorded. Redo (including undo modification) must be persisted to disk before data pages.
(2) When the transaction needs to be rolled back, because of undo, the data page can be rolled back to the state of the previous mirror. When the crash is recovered, if the transaction in the redo log does not have a corresponding commit record, then it is necessary to use undo to modify the transaction Roll back to before the transaction started.
(3) If there is a commit record, use redo to roll forward to the completion of the transaction and commit it.

14. Will the non-clustered index be returned to the table for query?
Not necessarily, if all the fields of the query statement hit the index, then there is no need to query back to the table.
To give a simple example, suppose we have established an index on the student table, then when the query select age from student where age < 20 is performed, the leaf node of the index already contains the age information, and will not be returned again Table query.

15. What is the difference or pros and cons between the Hash index and the B+ tree?
First of all, we must know the underlying implementation principles of the Hash index and the B+ tree index: the bottom
layer of the hash index is the hash table. When searching, you can get it by calling the hash function once The corresponding key value, and then query back to the table to obtain the actual data. The underlying implementation of the B+ tree is a multi-way balanced search tree. For each query, start from the root node. Only when the leaf node is found can the query key value be obtained, and then judge whether it is necessary to return to the table to query data according to the query.
Then it can be seen that they have the following differences:
(1) The hash index is faster for equivalent query (in general), but it cannot perform range query.
(2) Because after the hash function is used to build the index in the hash index, the order of the index cannot be consistent with the original order, and range queries cannot be supported. And all nodes of the B+ tree follow (the left node is smaller than the parent node, the right node is larger than the parent node, and the multi-fork tree is similar), which naturally supports the range.
(3) The hash index does not support using the index for sorting, the principle is the same as above.
(4) Hash indexes do not support fuzzy queries and leftmost prefix matching of multi-column indexes. The principle is also because of the unpredictability of the hash function. The indexes of AAAA and AAAAB are not correlated.
(5) The hash index cannot avoid returning to the table to query data at any time, and the B+ tree can only complete the query through the index when certain conditions (clustered index, covering index, etc.) are met.
(6) Although the hash index is faster in the equivalent query, it is unstable. The performance is unpredictable. When there are a lot of repetitions in a certain key value, a hash collision occurs, and the efficiency may be extremely poor at this time. The query efficiency of the B+ tree is relatively stable, and all queries are from the root node to the leaf node, and the height of the tree is low.
(7) Therefore, in most cases, directly selecting the B+ tree index can obtain stable and better query speed. Instead of using a hash index.

16. What is the meaning of select for update, will it lock the table or lock the row or other?
The meaning of select for update:
The select query statement will not be locked, but select for update will not only have the function of query, but also lock, and it is a pessimistic lock. As for adding a row lock or a table lock, it depends on whether the index/primary key is used.
If no index/primary key is used, it is a table lock, otherwise it is a row lock.

17. Does your database support emoji expression storage, if not, how to operate?
Replace character set utf8–>utf8mb4

18. The data structure of the index (b tree, hash) respectively introduces
the data structure of the index and the implementation of the specific storage engine. The indexes used more in MySQL include Hash index, B+ tree index, etc., and the InnoDB storage we often use The engine's default index implementation is: B+ tree index. For a hash index, the underlying data structure is a hash table. Therefore, when most of the requirements are for a single record query, you can choose a hash index, which has the fastest query performance; for most other scenarios, it is recommended to choose a BTree index.
B-tree index:
MySQL fetches data through the storage engine. Basically, 90% of people use InnoDB. According to the implementation method, there are currently only two types of InnoDB index: BTREE (B-tree) index and HASH index. B-tree index is the most frequently used index type in MySQL database, and basically all storage engines support BTree index. Usually, the index we say refers to (B-tree) index (actually implemented with B+ tree, because MySQL always prints BTREE when viewing table indexes, so it is called B-tree index for short) Hash index
:
briefly Next, similar to the HASH table (hash table) that is simply implemented in the data structure, when we use the hash index in MySQL, it is mainly through the Hash algorithm (common Hash algorithms include direct addressing method, square method, folding method, divisor method, random number method), convert the database field data into a fixed-length Hash value, and store it in the corresponding position of the Hash table together with the row pointer of this data; if a Hash collision occurs (two different keys The Hash value of the word is the same), then it is stored in the form of a linked list under the corresponding Hash key. Of course this is just a rough simulation.

19. What is the leftmost matching principle of the index?
When creating a joint index, it is generally necessary to follow the leftmost matching principle. That is, the attribute with the highest degree of recognition in the joint index is placed at the front of the query statement.

20. What is the purpose of indexing?
(1) Quickly access specific information in the data table to improve retrieval speed.
(2) Create a unique index to ensure the uniqueness of each row of data in the database table.
(3) Accelerate the connection between tables and tables.
(4) When using grouping and sorting clauses for data retrieval, the time for grouping and sorting in queries can be significantly reduced.

21. What is the negative impact of indexing on the database system?
It takes time to create and maintain indexes, and this time increases as the amount of data increases; indexes need to occupy physical space, not only tables need to occupy data space, each index also needs to occupy physical space; when adding or deleting tables , Change, and index must also be dynamically maintained, which reduces the speed of data maintenance.

22. What are the principles for indexing data tables?
(1) Build an index on the most frequently used field to narrow the scope of the query.
(2) Build indexes on frequently used fields that need to be sorted

23. Under what circumstances is it not appropriate to build an index?
(1) For columns that are rarely involved in queries or columns with many repeated values, it is not appropriate to build indexes.
(2) For some special data types, it is not suitable to establish indexes, such as text fields (text), etc.
(3) It is not suitable to establish indexes for tables that are frequently added, deleted, and modified.

24. What is the leftmost prefix principle? What is the leftmost matching principle?
The principle of the leftmost prefix is ​​the leftmost priority. When creating a multi-column index, according to business needs, the most frequently used column in the where clause should be placed on the leftmost.
When we create a combined index, such as (k1, k2, k3), it is equivalent to creating three indexes (k1), (k1, k2) and (k1, k2, k3), which is the leftmost matching principle. .
25. What is myisamchk used for?
It is used to compress MyISAM tables, which reduces disk or memory usage.

26. Tell me about the design of sub-database and sub-table?
Sub-database and table-splitting scheme:
(1) Horizontal database sharding: based on the field, according to a certain strategy (hash, range, etc.), split the data in one database into multiple databases.
(2) Horizontal table splitting: Split the data in one table into multiple tables based on fields and according to certain strategies (hash, range, etc.).
(3) Vertical sub-database: Based on tables, different tables are split into different libraries according to different business attributions.
(4) Vertical table division: Based on the field and according to the activity of the field, the fields in the table are split into different tables (main table and extended table).
Commonly used sub-database and table middleware:
sharding-jdbc, Mycat, TDDL, Oceanus, vitess, Atlas
sub-database and sub-table may encounter problems:
(1) transaction problem: need to use distributed transactions
(2) cross-node join The problem: to solve this problem can be divided into two queries to achieve
(3) cross-node count, order by, group by and aggregation function problem: after obtaining the results on each node, merge them on the application side.
(4) Question D: After the database is split, you can no longer rely on the primary key generation mechanism of the database itself. The easiest way is to consider UUID. (5) Sorting and paging
across shards (increasing the pagesize processing in the background?)

27. Under what circumstances is the index set but cannot be used
(1) The LIKE statement starting with "%", fuzzy matching
(2) The index is not used at the same time before and after the OR statement
(3) There is an implicit conversion of the data type (for example, varchar does not add a single Quotes may be automatically converted to int type)
(4) Calculate the field or use the function
(5) to reverse the operation. Such as is not null
28. How to delete the index?
Delete the primary key index:
alter table table name drop primary key (because there is only one primary key). It is worth noting here that if the primary key is self-growth, then this operation cannot be performed directly (self-growth depends on the primary key index)
to delete other indexes:
alter table table name drop index index name

29. What is a database connection pool? Why do you need a database connection pool?
The principle of the database connection pool:
in the internal object pool, maintain a certain number of database connections, and expose the acquisition and return methods of the database connection.
The process of establishing a connection between the application program and the database:
(1) establish a connection with the database server through the three-way handshake of the TCP protocol
(2) send the database user account password, and wait for the database to verify the user’s identity
(3) after completing the identity verification, the system can submit the SQL statement Go to the database to execute
(4) close the connection, and TCP waved goodbye four times.
Benefits of database connection pool:
(1) Resource reuse (connection reuse)
(2) Faster system response speed
(3) New resource allocation method
(4) Unified connection management to avoid database connection leaks

30. What is the column comparison operator?
Use the =, <>, <=, <, >=, >, <<, >>, <=>, AND, OR, or LIKE operators in column comparisons in a SELECT statement.

31. According to the granularity of locks, what are the database locks?
According to lock granularity:
table lock, page lock, row lock
According to lock mechanism:
optimistic lock, pessimistic lock

32. What do % and _ in the LIKE statement mean?
% corresponds to 0 or more characters and _ is just one character in the LIKE statement.

33. How to convert between Unix and MySQL timestamp?
UNIX_TIMESTAMP is the command to convert from MySQL timestamp to Unix timestamp
FROM_UNIXTIME is the command to convert from Unix timestamp to MySQL timestamp

34. How to locate and optimize the performance problems of SQL statements? Has the created index been used? Or how can I know why this statement runs so slowly?
The most important and effective way to locate low-performance SQL statements is to use the execution plan. MySQL provides the explain command to view the execution plan of the statement. We know that no matter what kind of database or database engine it is, many related optimizations will be done during the execution of a SQL statement. For query statements, the most important optimization method is to use indexes. The execution plan is to display the details of the execution of the SQL statement by the database engine, which includes whether to use an index, what index to use, and related information about the index used.

Word document download address: mysql database frequently asked interview questions

Guess you like

Origin blog.csdn.net/ma286388309/article/details/129620831