The most useful Mysql interview questions that took three months, interviewed by countless companies

If you want to enter a big factory, mysql can't do that. Come accept the mysql interview challenge and see where you can stick to it?

1. Can you tell me the difference between myisam and innodb?

The myisam engine is the default engine before version 5.1. It supports full-text retrieval, compression, spatial functions, etc., but does not support transactions and row-level locks, so it is generally used in scenarios where there are a large number of queries and a small amount of insertion, and myisam does not support foreign keys. And the index and data are stored separately.

Innodb is based on a clustered index. Contrary to myisam, it supports transactions, foreign keys, and supports high concurrency through MVCC. Indexes and data are stored together.

2. Tell me about mysql indexes, and what are clustered and non-clustered indexes?

According to the data structure, the index mainly includes B+ tree and Hash index.

Suppose we have a table with the following structure:

create table user(
    id int(11) not null,
  age int(11) not null,
  primary key(id),
  key(age)
);

The B+ tree is a sequential storage structure with small left and large right. The nodes only contain the id index column, while the leaf nodes contain the index column and data. The index method in which data and index are stored together is called clustered index. A table can only have A clustered index. Assuming that the primary key is not defined, InnoDB will choose a unique non-empty index instead, if not, it will implicitly define a primary key as a clustered index.

image.png

This is the structure of the primary key clustered index storage, so what does the structure of the non-clustered index look like? The non-clustered index (secondary index) stores the primary key id value, which is different from the data address stored by myisam.

 

image.png

Finally, let's take a picture to see the difference between InnoDB and Myisam clustered and non-clustered indexes

 

3. Do you know what are covering indexes and back tables?

Covering index means that in a query, if an index contains or covers the values ​​of all fields that need to be queried, we call it a covering index, and there is no need to go back to the table for query.

To determine whether a query is a covering index, we only need to explain sql statement to see whether the result of Extra is "Using index".

Take the user table above as an example. Let's add another name field and try some queries.

explain select * from user where age=1; //查询的name无法从索引数据获取
explain select id,age from user where age=1; //可以直接从索引获取

4. What are the types of locks?

MySQL locks are divided into shared locks and exclusive locks , also called read locks and write locks.

The read lock is shared and can be realized through lock in share mode. At this time, it can only read but not write.

The write lock is exclusive, it will block other write locks and read locks. From the granularity to distinguish, can be divided into table locks and row locks .

Table locks will lock the entire table and block all read and write operations on the table by other users. For example, alter will lock the table when modifying the table structure.

Row locks can be divided into optimistic locks and pessimistic locks . Pessimistic locks can be implemented through for update, and optimistic locks are implemented through version numbers.

5. Can you talk about the basic characteristics and isolation levels of transactions?

The basic characteristics of transaction ACID are:

Atomicity refers to whether all operations in a transaction succeed or all fail.

Consistency means that the database always transitions from a consistent state to another consistent state. For example, A transfers 100 yuan to B. Assuming that the system crashes during the middle sql execution, A will not lose 100 yuan, because the transaction is not committed, and the modification will not be saved to the database.

Isolation means that the modification of a transaction is invisible to other transactions before it is finally committed.

Persistence means that once the transaction is committed, the changes made will be permanently stored in the database.

The isolation has 4 isolation levels, namely:

read uncommit read uncommitted  , may read the uncommitted data of other transactions, also called dirty read.

The user should read that the user's age with id=1 should be 10. As a result, he reads transactions that have not yet been committed by other transactions. As a result, the read result is age=20, which is a dirty read.

image.png

The read commit  read has been submitted, and the two read results are inconsistent, which is called non-repeatable read.

Non-repeatable read solves the problem of dirty reads, it will only read the committed transactions.

The user starts the transaction to read the user id=1, the query finds age=10, and the result is read again=20. The same query reads different results in the same transaction is called non-repeatable read.

image.png

 Repeatable read can be repeated and repeated. This is the default level of mysql, that is, the result of each read is the same, but phantom reads may occur.

The serializable  serial is generally not used. It will lock each row of data read, which will cause a lot of timeout and lock competition problems.

6. What guarantee does ACID rely on?

A atomicity is guaranteed by the undo log log, which records the log information that needs to be rolled back. When the transaction is rolled back, the successfully executed SQL is undone.

C consistency is generally guaranteed by the code level

I isolation is guaranteed by MVCC

D persistence is guaranteed by memory + redo log, mysql modifies data and records this operation in memory and redo log at the same time. When the transaction is committed, it is flushed through redo log, and it can be restored from redo log when it is down.

7. What do you mean by phantom reading, and what is MVCC?

To talk about phantom reading, we must first understand MVCC. MVCC is called multi-version concurrency control, which actually saves a snapshot of data at a certain time node.

We actually hide two columns per row, the creation time version number, and the expiration (deleting) time version number. Each time a new transaction is started, the version number will automatically increase.

Take the example of the user table above, suppose we insert two pieces of data, they should actually look like this.

id name create_version delete_version
1 Zhang San 1  
2 Li Si 2  

At this time, assume that Xiao Ming is going to execute the query, and current_version=3

select * from user where id<=3;
复制代码

At the same time, Xiaohong starts the transaction at this time to modify the record with id=1, current_version=4

update user set name='张三三' where id=1;
复制代码

The result after successful execution is like this

id name create_version delete_version
1 Zhang San 1  
2 Li Si 2  
1 Zhang Sansan 4  

If Xiaohei is deleting the data with id=2 and current_version=5 at this time, the result after execution is like this.

id name create_version delete_version
1 Zhang San 1  
2 Li Si 2 5
1 Zhang Sansan 4  

Since the principle of MVCC is to find that the created version is less than or equal to the current transaction version, and the deleted version is empty or greater than the current transaction version, Xiao Ming's real query should be like this

select * from user where id<=3 and create_version<=3 and (delete_version>3 or delete_version is null);
复制代码

So the last name Xiaoming queried with id=1 was still'Zhang San', and the record with id=2 could also be queried. This is to ensure that the data read by the transaction already exists before the transaction starts, or it is inserted or modified by the transaction itself .

Understand the principle of MVCC, what is phantom reading is much simpler for us. For a common scenario, when a user registers, we first query whether the user name exists, and insert it if it does not exist, assuming the user name is a unique index.

  1. Xiaoming opened the transaction current_version=6 to query the record named'Wang Wu' and found that it did not exist.

  2. Xiaohong opens the transaction current_version=7 to insert a piece of data, and the result is this:

id Name create_version delete_version
1 Zhang San 1  
2 Li Si 2  
3 Wang Wu 7  
  1. Xiaoming inserted the record named'Wang Wu' and found that the unique index conflicted and could not be inserted. This is a phantom reading.

8. Do you know what a gap lock is?

Gap lock is a lock available only under the repeatable read level. Combining MVCC and gap lock can solve the problem of phantom reading. Let's take user as an example, assuming that there are several records in the user table

id Age
1 10
2 20
3 30

When we execute:

begin;
select * from user where age=20 for update;

begin;
insert into user(age) values(10); #成功
insert into user(age) values(11); #失败
insert into user(age) values(20); #失败
insert into user(age) values(21); #失败
insert into user(age) values(30); #失败

Only 10 can be inserted successfully, so because of the gap between the tables, mysql automatically generated the interval for us (open left and closed right)

(negative infinity,10],(10,20],(20,30],(30,positive infinity)

Since there are records in 20, the intervals (10,20] and (20,30) are locked and cannot be inserted or deleted.

What if query 21? It will locate the interval (20, 30) according to 21 (both open intervals).

It should be noted that the unique index will not have gap indexes.

9. What is the magnitude of your data? How to do sub-database sub-table?

First, the sub-database sub-table is divided into vertical and horizontal ways. Generally speaking, the order of our splitting is vertical first and then horizontal.

Vertical sub-library

Based on the current split of microservices, we have achieved vertical sub-database

image.png

Vertical sub-table

If there are many fields in the table, split the ones that are not commonly used, those with larger data, etc.

image.png

Level score table

First, decide what field to use as the sharding_key according to the business scenario. For example, we currently have 10 million orders per day. Most of our scenarios come from the C side. We can use user_id as the sharding_key. Data query is supported up to the last 3 months. If the order is archived for more than 3 months, the data volume in 3 months is 900 million, which can be divided into 1024 tables, so the data of each table is about 1 million.

For example, if the user id is 100, then we all go through hash(100), and then modulo 1024, then we can fall into the corresponding table.

10. How to ensure the uniqueness of the ID after the sub-table?

Because our primary keys are all auto-incremented by default, then the primary keys after sub-tables will definitely conflict in different tables. There are several ways to consider:

  1. Set the step size, such as 1-1024 tables, we set the basic step size of 1-1024 respectively, so that the primary keys will not conflict when they fall into different tables.
  2. Distributed ID, implement a set of distributed ID generation algorithm by yourself or use open source such as snowflake algorithm
  3. After the tables are divided, the primary key is not used as the basis for query, but a new field is added separately for each table as the unique primary key. For example, the order number of the order table is unique. No matter which table it ultimately falls on, it is based on the order number as the basis for query. Update Same thing.

11. How to deal with non-sharding_key queries after table sharding?

  1. You can make a mapping table. For example, what if the merchant wants to query the order list at this time? You can't scan the entire table without user_id query, right? So we can make a mapping relationship table to save the relationship between merchants and users. When querying, first query the user list through the merchant, and then query it through user_id.
  2. For wide tables, generally speaking, merchants do not have very high requirements for real-time data. For example, to query the order list, you can synchronize the order table to the offline (real-time) data warehouse, and then make a wide table based on the data warehouse. Others such as es provide query services.
  3. If the amount of data is not very large, such as some queries in the background, you can also scan the table through multiple threads and then aggregate the results. Or asynchronous form is also possible.
List<Callable<List<User>>> taskList = Lists.newArrayList();
for (int shardingIndex = 0; shardingIndex < 1024; shardingIndex++) {
    taskList.add(() -> (userMapper.getProcessingAccountList(shardingIndex)));
}
List<ThirdAccountInfo> list = null;
try {
    list = taskExecutor.executeTask(taskList);
} catch (Exception e) {
    //do something
}

public class TaskExecutor {
    public <T> List<T> executeTask(Collection<? extends Callable<T>> tasks) throws Exception {
        List<T> result = Lists.newArrayList();
        List<Future<T>> futures = ExecutorUtil.invokeAll(tasks);
        for (Future<T> future : futures) {
            result.add(future.get());
        }
        return result;
    }
}

12. Tell me how to do mysql master-slave synchronization?

First understand the principle of mysql master-slave synchronization

  1. After the master commits the transaction, write to the binlog
  2. The slave connects to the master and obtains the binlog
  3. The master creates a dump thread and pushes the binglog to the slave
  4. The slave starts an IO thread to read the binlog of the synchronized master and records it in the relay log relay log
  5. The slave starts a sql thread to read the relay log event and executes it on the slave to complete the synchronization
  6. slave records its own binglog

image.png

Since the default replication mode of mysql is asynchronous, the master library does not care whether the slave library has processed the log after sending the log to the slave library. This will cause a problem to assume that the master library is down and the slave library processing fails. After becoming the main library, the log is lost. Two concepts arise from this.

Fully synchronous replication

After the master library writes to the binlog, the log is forced to synchronize the log to the slave library, and all the slave libraries are executed before returning to the client. Obviously, performance will be severely affected in this way.

Semi-synchronous replication

Different from full synchronization, the logic of semi-synchronous replication is such that after the slave library writes the log successfully, it returns an ACK confirmation to the master library, and the master library considers the write operation to be completed when it receives at least one confirmation from the slave library.

13. How to solve the delay of master and slave?

This problem seems to be really an unsolvable problem, it can only be judged by yourself, and you need to use the main database to force the main database to query.

 

Guess you like

Origin blog.csdn.net/yuandengta/article/details/108943036