[Tortured soul] MySQL interview asked 100 high frequency (engineer direction)

Author: Huyan ten

juejin.im/post/5d351303f265da1bd30596f9

Foreword

In this paper, the audience for developers, it is not related to the deployment of MySQL service and other operations, and more content, everyone is ready to be patient and seeds of mineral water.

Before the burst of the system to learn a bit MySQL, there are some practical experience, the chance to see an interview and MySQL-related articles, I found that some of these questions answer themselves well, although most are aware knowledge, but not knowledge series.

Therefore we decided to engage 100 to ask a MySQL soul, try to answer the questions the way, so that their understanding of the knowledge points more in-depth point.

This article does not regard all manner of usage select from the beginning to explain mysql, aimed primarily at developers need to know some knowledge of MySQL, including indexing, transaction, optimization, high frequency in the form of interview questions give the answer.

Related Index

About MySQL index had conducted a summary article link here Mysql index principle and optimization.

1. What is the index?

The index is a data structure that can help us to find data quickly.

2. The index is what kind of data structure?

Data structure and specific storage engine indexes related to use more of the index in MySQL there Hash index, B + tree indexes, and the default storage engine InnoDB index we often use to achieve: B + tree index.

3. Hash B + tree index and what is the difference or all the pros and cons of it?

Hash index must first know and B + tree index underlying implementation principles:

hash index is the underlying hash table, when to find, call a hash function can get to the appropriate key, followed by the query back to the table to get the actual data .B + tree underlying implementation is more than balanced search trees. For every query starting from the root, find leaf node parties may obtain the search key, and then back to the table query data according to the query whether it is necessary.

We can see that they have the following differences:

hash index equivalent queries faster (in general), but it can not be the scope of the query.

Since then indexed through hash function hash index, the index of the sequence with the original order can not be consistent, can not support range queries. And all the nodes B + tree's are to follow (left node is less than the parent node, right node is greater than the parent, and more tree is similar), native support range.

hash indexes do not support the use of indexes sort principle above.
hash indexes do not support fuzzy queries and multi-column index of the most left-prefix match principle but also because of the unpredictable hash function. AAAA and AAAAB index of no relevance.
hash index can not be avoided at all times to query the data back to the table, while the B + tree queries can be done when certain conditions are met (clustered indexes, covering indexes, etc.) only by the index.
Although faster hash index on the equivalence queries, but unstable. unpredictable performance, when there are a large number of duplicate keys when hash collision occurs, then efficiency may be poor. The query efficiency B + tree is relatively stable, for all queries are from the root node to the leaf node, and low height of the tree.

Therefore, in most cases, to directly select B + tree index can obtain stable and better query speed without the need to use hash indexes.

4. The above mentioned B + tree does not need to query the data back to the table in time to meet clustered indexes and covering indexes, what is the clustered index?

The index B + tree leaf node may store the current key value, may also be stored in the current key value and the entire row of data, which is clustered index and non-clustered index. In InnoDB, only the primary key index is clustered index, if no primary key, then choose a unique key to establish a clustered index. If there is no unique key, then implicitly generated to build a clustered index key.

When a query using a clustered index, the corresponding leaf node, you can get the entire row of data, so do not be back to the table this time.

5. Non-clustered index table query will return it?

Not necessarily, this involves a query whether all the required fields hit the index, if the index all hit, then you do not have to be back to the table query.

Here is a simple example, suppose we have created an index on the age of the employee table, then when performing select age from employee where age < 20a query in the leaf nodes of the index, already contains the information age, will not be back to the table this time.

6. When indexed, what are the factors to consider it?

When the general index to take into account the frequency of use fields, often query field as conditions are suitable. If you need to establish a joint index, it is also the order of the joint index needs to be considered. In addition also consider other aspects, such as to prevent excessive All cause too much pressure on the table. these are the actual table structure and query related.

7. What index is the joint? Why pay attention to the order of the joint index?

MySQL can use multiple fields while building an index, called the joint index. In the joint index, the index if you want to hit, field need to follow the order of indexing one by one, otherwise it can not hit the index.

Specific reasons are:

MySQL use to index index orderly Suppose now that the establishment of a "name, age, school" joint index, the index of the sort: first name in accordance with the order, if the same name, then sort according to age, if the value of age are equal then sorted according to the school.

When queried, only this time the index name in accordance with strict and orderly, it must first be equivalent query using the name field, and then for a match to the column, according to their age fields strict and orderly, then you can use fields with age index lookup ,,, do so. Therefore, when establishing joint index should be noted that the order of the index column, under normal circumstances, the query field frequently demand or high selectivity columns on the front. Also a special case can query based or separate adjustment table structure.

8. Create the index has not been used to? Or how can know the reason for this statement is running very slow?

MySQL provides explain command to view the execution plan statement, MySQL before executing a statement, that statement go over the query optimizer, will get after the analysis of the sentence, that is, the execution plan, which contains a lot of information. and index information by which to analyze whether the relevant index hit, for example possilbe_key, key, key_len and other fields, respectively, illustrate this statement may use the index, the index actually used and the length of the index used.

9. So what happens in the index is created but not used when the query it for this column?

Use is not equal to the query,
Column involved in math or function
When the string is a wildcard like left Like '% aaa'.
When a full table scan analysis mysql does not use index than when using index fast.
When used in conjunction index, a condition for the front range queries, even the most left-prefix in line with the principles behind, you can not use the index.

The above case, MySQL can not use indexes.

Transaction-related

1. What is a transaction?

Understand what matters most classic is the transfer of chestnuts, I believe we all know, there is not to say aside.

A transaction is a series of operations, they have to comply with the ACID properties of the most common understanding is: operations in the transaction either all succeed, or all fail but only this is not enough.

What 2. ACID yes? Details can be said about it?

A=Atomicity

Atomicity, is above that, either all succeed, or all fail. Impossible to carry out only part of the operation.

C=Consistency

System (database) is always a consistent transition from state to another consistent state, without an intermediate state.

I=Isolation

Isolation: Generally speaking: a transaction before it is fully committed, other transactions are not visible to the attention of the front of the general, the addition of red, which means there are exceptions.

D=Durability

Persistence, once the transaction is committed, then it will always be like this, even if the system crashes will not affect the outcome of the transaction.

3. At the same time there are multiple transactions during what will happen?

Complicated by multiple transactions in general can cause the following problems:

Dirty read: A transaction to read the content B uncommitted transaction, the transaction was rolled back behind the B.
Non-repeatable read: A transaction can only be set when reading section B of the transaction has been submitted, the transaction will result in a A two queries, the result was different, because in the meantime been submitted to the transaction B operation.
Magic Reading: A transaction reads a content range B while this transaction data during insertion causing a "illusion"..

4. how to solve these problems it? MySQL transaction isolation level to understand it?

MySQL four isolation levels are as follows:

Uncommitted read (READ UNCOMMITTED)

This is known as exceptions to the above, this isolation level, you can see part of other transactions modify this transaction is not committed. Therefore cause problems dirty read (read some of the other uncommitted transactions, but after the transaction It was rolled back).

This level of performance is not big enough advantage, but there are a lot of problems, and therefore rarely used.

Read Committed (READ COMMITTED)

Other transactions can only read part of this transaction has been submitted. This isolation level there is a problem of non-repeatable read, read within the same transaction twice, to get the results turned out to be different, because another transaction data It has been modified.

REPEATABLE READ (repeatable read)

Repeatable Read isolation level to solve the problem (see also know the name), but there are still a new issue of the above non-repeatable read, phantom read is that when you read the id> 10 rows of data, all rows involved plus on a read lock, then insert a new exception to a transaction data a id = 11, because it is newly inserted, so the above exclusion will not trigger lock, then carry out this transaction for the next will find that there is a query id = 11 data, while the last query did not get to, and then insert will have a primary key conflict.

SERIALIZABLE (serializable)

This is the highest level of isolation can solve all the problems mentioned above, it will be because he forced the operation of the serial execution, which can lead to the concurrent decline in performance speed, and therefore not very common.

5. Innodb use what kind of isolation level it?

InnoDB default is repeatable read isolation level.

6. MySQL locks understand it?

When the database has concurrent transactions, may produce inconsistent data, this time need some mechanism to ensure access to the order, the lock mechanism is such a mechanism.

Like hotel rooms, if you wander in and out, the situation will be more than snatch the same room, but the room locked coat, apply to key personnel can stay in the room and locked up, others only when he used up it can be used again.

7. MySQL What are lock it? It looks like the above lock is it not hinder the efficiency of concurrent?

From the lock category is concerned, there are shared locks and exclusive locks.

Shared lock: also called a read lock when the user wants to read the data, plus the data shared lock shared lock can simultaneously add more...

Exclusive lock: write lock is also called when the user attempts to write data, the data plus exclusive lock exclusive lock can only be a plus, he and other exclusive locks, shared locks are repulsive...

Using the above example is the user's behavior, there are two, one is the view room, with multiple users showings are acceptable. One is truly one night, during which both want to stay or may not want to see the room.

Lock granularity depending on the particular storage engine, InnoDB row level locking achieved, page-level locking, table lock.

Their large size from locking overhead, concurrent capacity is descending.

Table Design

1. Why should we try to set a primary key?

A primary key is database to ensure data row after an entire table guarantee uniqueness, even if this tables without primary keys on business, but also proposed to add a self-growth ID column as the primary key. Set the primary key, may follow when pruning investigation more quickly and ensure that the operating range of data security.

2. The primary key increment ID or UUID?

Recommended increment ID, do not use UUID.

Because InnoDB storage engine, the primary key index is a clustered index exists, that is, the primary key index B + stores the primary key index and all data (order) on the leaves node, if the primary key index is incremented ID, so just need to keep back arrangement can, if it is UUID, due to the arrival of the ID with the original size of the uncertainty will cause a lot of data insertion, data movement, and then lead to a lot of memory fragmentation, which causes decreased performance insert .

In short, the amount of data in some cases, with the increment primary key performance will be better.

Photos from the "High Performance MySQL": where the default suffix to use the increment ID, _uuid to use UUID-based key test, test performance 100w insert rows and rows of 300w.

[Tortured soul] MySQL interview asked 100 high frequency (engineer direction)

About primary key clustered index, if not the primary key, InnoDB selects a unique key as a clustered index, if there is no unique key, generates a primary key implicitly.

If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.

If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.

3. Why requirement field is defined as not null?

MySQL introduced this official website:

NULL columns require additional space in the rowto record whether their values are NULL. For MyISAM tables, each NULL columntakes one bit extra, rounded up to the nearest byte.

null value uses more bytes, and will result in many cases is inconsistent with the expectations in the program.

4. If you want to store the user's password hash, which fields should be used for storage?

Fixed-length string of a cryptographic hash, a salt, a user ID number and other char should be used instead to store varchar, which can save space and improve the retrieval efficiency.

Storage Engine Related

1. What MySQL storage engine support?

MySQL supports multiple storage engines, such as InnoDB, MyISAM, Memory, Archive, etc. In most cases, directly choose to use InnoDB engine is the most appropriate, MySQL InnoDB is the default storage engine.

InnoDB and MyISAM What is the difference?

InnoDB support things, but MyISAM does not support things
InnoDB supports row-level locking, and support MyISAM table-level locking
InnoDB support MVCC, but MyISAM does not support
Support InnoDB foreign keys, and MyISAM does not support
InnoDB does not support full-text indexing, and MyISAM support.

Scattered problems

1. MySQL varchar, and char in what is the difference.

char is a fixed-length field, if applied for a char(10)space, then no matter how much the actual content storage. The field occupies 10 characters, while the varchar is variable length, that is just the application of the maximum length, the space occupied by the actual character length + 1, the last character is stored using a long space.

In the search terms of efficiency, char> varchar, therefore, in use, if the length value of a field is determined, may be used char, or should try to use varchar. E.g. password storing user MD5 encryption should be used char.

2. varchar (10) and int (10) What is the meaning?

varchar 10 represents the length of the application space, the maximum length of data that can be stored is, the int represents the length of only 10 show, to less than ten zeros. That is, int (. 1) and int (10) and it can store digital footprint size is the same, but in accordance with the length of the display when the display.

3. MySQL's binlog There are several entry format? What is the difference respectively?

There are three formats, statement, row and mixed.

Under statement mode, the recording unit for the statement. That is the impact of each sql will be recorded due to the implementation of sql there is a context, it is necessary to save relevant information to save time, as well as some used functions such statements recording can not be copied.
The row level, changes to a recording unit of each row, may all be written down substantially due to the many operations, however, can lead to large changes in the line (such alter table), thus saving the file information of this mode too, so the amount of log Big.
mixed. a compromise solution, normal operation with the statement recorded statement can not be used when using the row.

In addition, the new version of MySQL in the level of the row have done some optimization, when the table structure changes, it will record the statement rather than progressive recording.

4. Large paging how to deal with?

Oversized pages generally from two directions for the settlement.

Database level, and this is our main focus of attention (although the effect is not so large), similar to select * from table where age > 20 limit 1000000,10such a query actually have room for optimization of this statement requires load1000000 data is then substantially all of the discarded, of course, only take 10 Compare slow. at that time we can modify as select * from table where id in (select id from table where age > 20 limit 1000000,10)though this also load a million of data, but because the index covering all fields to be queried are in the index, so the speed will be soon. And if ID consecutive good, we can also select * from table where id > 1000000 limit 10efficiency is also a good possibility of optimization there are many ways, but the core idea is the same, is to reduce the data load.
From the perspective of reducing the demand for such a request ... The main is not similar needs (jump directly to a specific page after page of several million only allowed to go page by page view or in accordance with a given route, so that predictable, cacheable) and to prevent leakage and consecutive ID have been maliciously attacked.

Paging solve large, in fact, mainly by caching, the predictability of the content found in advance, such as cache to redis kV database, can be returned directly.

Ali Baba "Java Development Manual", the solution for large pagination is similar to the above-mentioned first.

[Tortured soul] MySQL interview asked 100 high frequency (engineer direction)

5. cared about business systems inside sql consuming it? Statistics slow query anyway? Queries on how slow optimized?

In the business system, in addition to the query using the primary key, the other I will test it took on a test library, slow query statistics mainly by the operation and maintenance in doing business on a regular basis will slow query feedback to us.

Slow query optimization first wants to understand what is the reason for slow? Query is not hit the index? Is a load of unnecessary data column? Or too much data?

So also for the optimization of these three directions,

Firstly, the statement to see if load additional data, the query may be redundant line and discarded, and may be loaded with many of the results column does not need to analyze and rewrite the statement.
Implementation plan analysis statement, then get it using the index, after modification statements or modify the index, so that statement can hit the index as much as possible.
If the optimization of the statement has not performed, consider the amount of data in the table is too large, and if so may be horizontal or vertical part table.

6. The above-mentioned lateral and longitudinal partition table part tables, they can give an example of a suitable difference?

Table row is divided by a transverse partition table. Suppose we have a user table, the primary key ID is incremented, and while the user's ID. Large amount of data, there are more than 100 million, then the time on a query table effect is not very satisfactory. we can primary key sub-table ID, whether it is divided by the tail number, ID or press section points are possible. assumed accordance 0-99 ending into 100 table, each table data is only 100w. At this time of the query efficiency is undoubtedly meet the requirements.

Longitudinal partition table is divided by columns of the table. Suppose we now have a table article. Contains the field id-摘要-内容. The display system in the form of a list is refreshed, the list contains only the title and abstract, only when a user clicks on an article into the details need body content. in this case, if the data is large, this will be great and do not frequently used are listed together original query speed will slow down the table. we can table above is divided into two. id-摘要, id-内容when the user click on details, that the primary key once again to take the content. and only a small increase in the storage capacity of the primary key field. small price.

Of course, in fact, sub-table and the associated high degree of service, before sub-table be sure to do research as well as benchmark. Do not follow their own conjecture blind operation.

7. What is a stored procedure? What are the advantages and disadvantages?

Stored procedures are prebuilt SQL statements. 1, more straightforward to understand: the process can be a stored record set, which is a code block by a number of T-SQL statements, such as T-SQL statement codes as a method to achieve some of the features (single table or multiple tables CRUD), and then give the block a name, call him on the line when use this function. 2, a stored procedure is precompiled code blocks, the efficiency is relatively high, a large number of alternative stored procedure T_SQL statements, network traffic can be reduced to improve the communication rate, data security can be ensured to a certain extent

However, in the Internet project, in fact, is not recommended stored procedure, is more famous Ali's "Java Development Manual" prohibit the use of stored procedures, I understand that in the Internet project, fast iteration, the project life cycle is relatively short, compared to traditional mobility projects more frequently, in this case, the management of the stored procedure really is not so easy, but, reusability did not write in the service layer so good.

8. talk about three paradigms

The first paradigm: Each column may not subdivided second paradigm: non-primary key column is fully dependent on the primary key, and not dependent on a third part of the primary key paradigm: non-primary key column only dependent on the primary key, does not depend on other non-primary key.

In the design of the database structure, when to try to follow three paradigms, if you do not comply, there must be sufficient grounds, such as performance. In fact, we often compromise performance for database design.

9. MyBatis in #

Chaos into a strange question ... .. I just want to record this issue alone, because the frequency of occurrence is too high.

# Contents will be passed as a string, what's the difference? * * Chaos into a strange question ..... I just want to record this issue alone, because the frequency of occurrence is too high. # Will pass content as a string, and the value will be passed directly to the stitching in the sql statement.

So # sql injection attacks can be prevented to some extent.

Recommended reading (click to jump to read)

1. SpringBoot content aggregator

2. face questions content aggregator

3. The design pattern content aggregator

4. Mybatis content aggregator

The multithreaded content aggregator