MySQL Interview FAQ summary

What is MySQL?

MySQL is a relational database is very common in Java enterprise development, because MySQL is open source, free, and easy expansion. Alibaba also a large database system uses MySQL, so its stability is guaranteed. MySQL is open source, so anyone can download under license GPL (General Public License) and modify it according to individual needs. MySQL's default port number is 3306 .

Storage Engine

Some commonly used commands

View all MySQL storage engines provided

mysql> show engines;

 

From the graph we can see the current default MySQL storage engine is InnoDB, and 5.7 versions of all of the InnoDB storage engine is only transactional storage engine, which means that only InnoDB support transactions.

View the current default MySQL storage engine

We can also use the following command to view the default storage engine.

mysql> show variables like '%storage_engine%';

View table storage engine

show table status like "table_name" ;

 

The difference between MyISAM and InnoDB

MyISAM is the default MySQL database engine (before version 5.5). Although the excellent performance, but also provides a number of features, including full-text indexing, compression, spatial functions, etc., but MyISAM does not support transactions and row-level locking, and after the collapse of the biggest flaw is not safe recovery. However, after version 5.5, MySQL introduced InnoDB (transactional database engine), the default version of MySQL 5.5 storage engine is InnoDB.

Most of the time we are using InnoDB storage engine, but use MyISAM and in some cases also suitable such as reading under intensive conditions. (If you do not mind MyISAM crash recovery problems, then).

Both of comparison:

  1. Supports row-level locking  : MyISAM only table-level locking (table-level locking), while InnoDB supports row-level locking (row-level locking) and table-level locking, defaults to row-level locking.
  2. Whether to support the transaction and to restore security after a crash: MyISAM  stressed that the performance of each query be atomic, which performs faster than InnoDB type, but does not provide transaction support. But InnoDB  provides transaction support advanced database features affairs, external keys. With a transaction (commit), transaction-safe rollback (rollback) and crash repair capacity (crash recovery capabilities) of (transaction-safe (ACID compliant) ) type table.
  3. Whether to support foreign keys:  MyISAM does not support, and InnoDB support.
  4. Whether to support MVCC  : Only InnoDB support. Cope with high concurrent transactions, MVCC is more efficient than simply locking; MVCC only  READ COMMITTED and  REPEATABLE READ work under two isolation level; MVCC can use optimistic (optimistic) and pessimistic locking (pessimistic) lock is achieved; each database MVCC implementation is not uniform . Recommended reading: MySQL InnoDB-MVCC multi-version concurrency control
  5. ......

"MySQL high performance" There is a saying wrote above:

Do not believe the voice of experience "MyISAM InnoDB faster than" like, this conclusion is often not absolute. In many scenes we know, InnoDB MyISAM speed can make far behind, especially where a clustered index, or the need to access data can be placed in memory applications.

Under normal circumstances we choose InnoDB is no problem, but in some cases you do not care about the safety and recovery issues scalability and concurrency, transaction support is not required, and did not care after the collapse, then select MyISAM is also a good choice . But under normal circumstances, we all need to consider these issues.

Character set and collation

It refers to a character set encoding certain characters from the binary symbol mapping. Collation refers to the collation under certain character sets. MySQL in each set will correspond to a series of collation.

MySQL uses a similar inheritance specify the character set defaults for each database and each table has its own default data values, layer by layer they inherit. For example: a database default character set of all tables in the database will be specified character set (in these tables do not specify the character set of circumstances, will use the default character set) PS: Since finishing "practice of Java engineers Road "

Details can refer to:  MySQL character set and collation of understanding

index

MySQL index data structures used mainly BTree index  and  hash indexes  . For the hash index, the underlying data structure is a hash table, so the vast majority of demand for the single record queries, they can choose hash index, the fastest query performance; most of the rest scene, it is recommended to select BTree index.

MySQL BTree the index is used in the B-tree B + Tree, but for implementation of the two main storage engines are different.

  • MyISAM:  B + Tree Data fields leaf node is the address of the data record. When the index search, the first search algorithm in accordance with the B + Tree search index, if the specified Key exists, the value of its data field is removed, then the value of the read address of the data field of the corresponding data record. This is called "non-clustered indexes."
  • InnoDB:  its data file itself is an index file. Compared MyISAM, index files, and data files are separated, the table data file which is itself a press B + Tree index organized structure, a tree leaf node data field holds a complete data record. This index is the primary key of key data tables, so InnoDB table data file itself is the main index. This is called "clustered index (or clustered index)." While the remaining indices are used as secondary indexes, data corresponding to the secondary index field stores the record instead of an address of the primary key, and this is MyISAM different places. According to the main search index, directly to the node where the key can be removed data; when looking for assistance according to the index, you need to remove the value of the primary key, the main index go again.  Therefore, in the design table when I do not recommend the use of long field as the primary key, do not recommend the use of non-monotonic field as the primary key, this will cause the main index frequently split.  PS: Since finishing "practice of Java Engineer Road"

More details about the index can view a detailed summary on the index page of the document under the MySQL directory.

Use the query cache

Time to execute a query, it will first check the cache. However, after the removal of MySQL 8.0 version, because this function is not practical

my.cnf add the following configuration, restart the MySQL query cache is turned on

query_cache_type=1
query_cache_size=600000

MySQL execute the following command can also open the query cache

set global  query_cache_type=1;
set global  query_cache_size=600000;

As above, the query cache is turned on, the result will be returned in the cache at the same data and the case of direct query . Here's query including query some information may affect the outcome of itself, the current database to query, client protocol version number. So any queries on any two different characters will cause the cache miss. In addition, if the query contains any user-defined functions, stored functions, user variables, temporary tables, system tables MySQL database, the query results will not be cached.

After the establishment of the cache, MySQL query caching system keeps track of each table involved in the query, if the tables (or data structure) changes, then all cache data and related this table will fail.

Although the cache can improve database query performance, but also brought a cache overhead after each query to do a cache operation, but also destroyed after failure.  Therefore, open the cache query to be careful, especially for write-intensive applications even more so. If enabled, pay attention to the reasonable control of the buffer space, in general, its size is set to MB dozens more appropriate. In addition, you can also be controlled by a query sql_cache and sql_no_cache need to cache:

select sql_no_cache count(*) from usr;

What is a transaction?

A transaction is a logical set of operations, or are executed or not executed.

Affairs are often the most classic example is out for the transfer. If Xiaoming give red transfer $ 1000, this transfer will involve two key operation is: will reduce the balance of 1000 yuan Xiao Ming, the red balance increased to 1,000 yuan. In case of sudden error between the two operating systems such as banks collapse, leading to a decrease in the balance of red Xiao Ming balance has not increased, this is not right. Affairs is to ensure that these two key operations either all succeed, or have failed.

Four characteristics of things (ACID)

 

  1. Atomicity (Atomicity):  the transaction is the smallest unit of execution, segmentation is not allowed. Operation to ensure atomicity of transactions either completed or totally ineffective;
  2. Consistency (Consistency):  Before and after the transaction, the data is consistent, the same results for a plurality of transaction data read is the same;
  3. Isolation (Isolation):  concurrent access to the database, a user's transaction is not disturbed by other transactions, database between concurrent transactions are independent;
  4. Persistent (Durability):  a transaction is committed after. It changed data in the database is persistent, even if the database fails nor should it have any impact.

Concurrent transactions bring what issues?

In a typical application, a plurality of concurrently running transactions, often operate in the same data to complete their tasks (a plurality of users operate on the same data). Concurrent although necessary, but it may cause the following problems.

  • Dirty read (Dirty read):  When a transaction is accessing data, and the data has been modified, and this modification has not been submitted to the database, then another transaction also access this data and then use this data. Because this data is not submitted, then another transaction read this data is "dirty data", based on what you did "dirty data" may be incorrect.
  • Lost modify (Lost to modify):  refers to when a transaction reads a data, another transaction also access the data, then modify the data in this first transaction after the second transaction also modify this data. Such modifications result in a transaction first one is lost, the loss of so called modified. For example: Transaction 1 reads a data table A = 20, A = read transaction 2 also 20, a modified transaction A = A-1, A modify transaction 2 = A-1, A = the final result. 19, transaction Amendment 1 is lost.
  • Non-repeatable read (Unrepeatableread):  means within a transaction reads the same data multiple times. When this transaction is not over, another transaction also access the data. So, between the two read data in the first transaction, due to the modification of the second transaction led to the first transaction data may be read twice not the same. This happened twice within a transaction read data is not the same situation, so called non-repeatable read.
  • Magic Reading (Phantom read):  Magic Reading and unrepeatable reads the like. It occurs in a transaction (T1) is read several lines of data, followed by another concurrent transaction (T2) is inserted into some of the data. In the following query, the first transaction (T1) will find more than a few original records do not exist, as if the same happened hallucinations, so called phantom reads.

Non-repeatable read and phantom read difference:

Non-repeatable read focus is to modify such a record is read repeatedly found that the value of some columns is modified, the new focus is phantom read or delete such multiple reads a record increase or decrease the discovery record.

What are the transaction isolation level? MySQL's default isolation level is?

SQL standard defines four levels of isolation:

  • READ-UNCOMMITTED (read uncommitted):  the lowest level of isolation, changes have not been allowed to read the data submitted, may cause dirty reads, non-repeatable reads or phantom reads .
  • READ-COMMITTED (read committed):  allow concurrent transactions to read data already submitted, can prevent dirty reads, but phantom reads or non-repeatable read may still occur .
  • REPEATABLE-READ (repeatable read):  multiple reads the results of the same field are the same, unless the data is modified their affairs themselves, you can prevent dirty reads and non-repeatable reads, but phantom reads still occur .
  • SERIALIZABLE (serialization):  the highest level of isolation, full compliance ACID isolation levels. All transactions executed one by one in sequence, it is impossible to produce interference between such matters, that is to say, this level prevents dirty reads, non-repeatable reads and phantom reads .

Isolation Levels Dirty read Non-repeatable read Phantom read
READ-UNCOMMITTED
READ-COMMITTED ×
REPEATABLE-READ × ×
SERIALIZABLE × × ×

Supported by default isolation level MySQL InnoDB storage engine is  REPEATABLE-READ (can be re-read) . We can SELECT @@tx_isolation;view the command

mysql> SELECT @@tx_isolation;
+-----------------+
| @@tx_isolation  |
+-----------------+
| REPEATABLE-READ |
+-----------------+

It should be noted that: the difference is that SQL standard InnoDB storage engine  REPEATABLE-READ (can be re-read)  the use of lower transaction isolation level is the Next-Key Lock lock algorithm, thus avoiding produce phantom read, as opposed to other databases system (such as SQL Server) are different. So supported by default storage engine InnoDB isolation level is  REPEATABLE-READ (can be re-read)  already fully guarantee transaction isolation requirements, namely to achieve the SQL standard  SERIALIZABLE (serialization)  isolation level. Because the lower the isolation level, the less transaction requests a lock, so most of the database system isolation level is  READ-COMMITTED (read submission)  , but you need to know is that InnoDB storage engine is used by default  REPEAaTABLE-READ (can be re-read )  does not have any performance loss.

InnoDB storage engine in  distributed transactions  under conditions generally used  SERIALIZABLE (serialization)  isolation level.

InnoDB locking mechanism and lock algorithm

MyISAM and InnoDB storage engine to use locks:

  • MyISAM uses table-level lock (table-level locking).
  • InnoDB supports row-level locking (row-level locking) and table-level locking, defaults to row-level locking

Table level lock, and row-level locks comparison:

  • Table-level lock:  MySQL locking  granularity biggest  one lock on the entire current operating table lock, simple, resource consumption is relatively small, fast locking, deadlock will not occur. Its maximum size lock trigger lock conflict highest probability minimum degree of concurrency, MyISAM and InnoDB engines support table-level locking.
  • Row-level locks:  the MySQL locking  granularity of the smallest  one lock for locking operation only for the current row. Row-level locking conflict can greatly reduce database operations. Locking its minimum size, high concurrency, but also the largest locking overhead, locking slow, there will be a deadlock.

Details can refer to: MySQL locking mechanism is simple look: https://blog.csdn.net/qq_34337272/article/details/80611486

Lock algorithm InnoDB storage engine, there are three:

  • Record lock: locks on individual rows
  • Gap lock: locks the gap, a locking range, the records themselves are not included
  • Next-key lock: record + gap a locking range, the records themselves containing

Knowledge Point:

  1. For queries using innodb line of next-key lock
  2. Next-locking keying order to solve the problem of phantom read Phantom Problem
  3. When the index query contains a unique attribute, the next-key lock downgraded to record key
  4. Gap lock design purpose is to prevent multiple transactions to insert records into the same range, and this will lead to phantom read problems
  5. There are two ways to close the gap explicit lock :( addition unique and foreign key constraints checks, using only the remaining record lock) A. The transaction isolation level is set to RC B. The parameter is set to 1 innodb_locks_unsafe_for_binlog

Large table optimization

When a single MySQL table records the number is too large, CRUD database performance will be significantly decreased, some common optimization measures are as follows:

1. Data defining the scope

Be sure to prohibit without any limit data query conditions. For example: When we query the user at the time of order history, we can control within a range of one month;

2. Read / write separation

The classic split database program, the main library is responsible for writing, reading from the library responsible;

3. The vertical partitioning

Split inside the database according to the correlation data table.  For example, a user's basic information table in both the user's login information of another user, the user can split the table into two separate tables, and even into separate libraries do sub-library.

Refers simply split vertically split data tabulated, to split a column of a table is more multiple tables.  As shown below, so that we should it easier to understand. 

  • Vertical advantage of split:  may be such that the column of data becomes smaller, reducing the number of reads at Block query, reduction of I / O. Further, the vertical partitions can be simplified table structure, easy to maintain.
  • Vertical split disadvantage:  the primary key will be redundant, the redundant columns need to manage, and will cause Join operation may be performed by the application layer Join solved. In addition, the vertical partitions make matters even more complicated;

4. horizontal partitioning

Constant holding structure of the table, the policy stored by some data pieces. Thus each piece of data into different table or database, to the distributed object. Horizontal resolution can support very large amounts of data.

Split level refers to the number of rows in the data table rows split the table over two million rows, it will slow down, then you can put the data in a table split into multiple tables to store. For example: we can split into multiple user information table user information table, a single table so you can avoid excessive impact on performance data.

 

Split level can support very large amounts of data. One thing to note is: sub-table is just a single table to solve the problem of excessive data, but the data sheet or on the same machine, in fact, to enhance the ability of concurrent MySQL does not make sense, so  the level of the best sub-libraries Split  .

Level can be split  support very large amounts of data storage, application side less transformation , but  fragmentation is difficult to solve transaction  , Join poor performance across nodes, complex logic. On the "practice of Java engineers Road" recommendation  try not to slice the data, because the split will bring logic, deployment, operation and maintenance of the complex  , the general data tables to support ten million or less in the case of proper optimization the amount of data that is not much problem. If you really want to slice, try to choose a client fragmented architecture, thus reducing the time and middleware, network I / O.

Add the following two common database program fragment:

  • Client Agent:  slicing end application logic, packet encapsulated in the jar, by modifying or packaged JDBC layers.  Dangdang  Sharding-JDBC  , Ali TDDL are two commonly used implementation.
  • Middleware agent:  in the middle of a data application and a proxy layer. Slicing logic to maintain unity in the middleware service.  We are talking about  myCat  , 360 of Atlas, Netease's DDB and so are the realization of this architecture.

Details can refer to: MySQL large table optimization:  https://segmentfault.com/a/1190000006158186

Explain what the pool design ideas. What is the database connection pool? Why do we need a database connection pool?

Pool design should not be a new term. As our common thread pool java, jdbc connection pool, redis connection pools is to represent this type of design implementation. This design will initially default resources to solve the problem is to offset the consumption of each access to resources, such as thread creation overhead, overhead obtain remote connection and so on. Like you go to the cafeteria Dafan Dafan aunt would put several Iimori good place where you came directly to holding a lunch box food to eat, they do not have a temporary hold rice and food fight, efficiency is high. In addition to the initialization resource, further comprising the pool design of these features: initial value, the active value of the pond, the pond maximum pond, etc. These features can be mapped directly to the member attributes java thread pools and database connection pool. This article on the pool design ideas presented well, directly copied, to avoid duplication made the wheels.

Database connection is essentially a socket connection. Database server should maintain some caching and user rights information like so takes up some memory. We can put the database connection pool is seen as a buffer to maintain the database connection, so that in the future these connections can be reused if necessary request to the database. Opening and maintaining database connections for each user, especially the request for a site application dynamic database driven, costly and a waste of resources. ** After the connection pool, create a connection, place it in the pool, and use it again, it is not necessary to establish a new connection. If you use all the connections, it will establish a new connection and add it to the pool. ** Connection Pooling also reduces the user must wait to establish a connection to the database of the time.

After the sub-library sub-table, how to deal with id primary key?

Because then, if divided into multiple tables, each table are cumulative from the beginning of 1, this is wrong, we need a globally unique id to support.

Id has generated a global following this in several ways:

  • UUID : not suitable for the primary key, because too long and disorderly unreadable, low query efficiency. More suitable for generating a unique name designation such as the name of the file.
  • Self-energizing database ID  : Two long sync database are provided, generation ID not duplicate strategy to achieve high availability. Id generated in this way orderly, but the need to deploy a separate database instance, high cost, there will be a performance bottleneck.
  • Use redis generate id:  performance is better, flexible and convenient, it does not depend on the database. However, the introduction of new components cause the system more complex and less available, more complex coding, increase system cost.
  • Twitter's snowflake algorithm  : Github Address: https://github.com/twitter-archive/snowflake.
  • US group's Leaf distributed generation system ID  : Leaf is the beauty of open source distributed group ID generator, to ensure global uniqueness, the trend is incremented monotonically increasing, information security, which also mentioned a comparison of several distributed approach, but also we need to rely on a relational database, Zookeeper middleware. feeling good. US group technical team article: https://tech.meituan.com/2017/04/21/mt-leaf.html  .
  • ......

How to execute a SQL statement in MySQL

How to execute a SQL statement in MySQL

MySQL optimization of high-performance specification recommends

MySQL optimization of high-performance specification recommends

A SQL statement is executed very slow What are the reasons?

Tencent interview: execute a SQL statement is very slow What are the reasons? --- Bukanhouhui series

Published 80 original articles · won praise 96 · views 360 000 +

Guess you like

Origin blog.csdn.net/Alen_xiaoxin/article/details/104779548