[Interview questions] 40000 word summary MYSQL interview questions

5a2585dded9b416fb4ea58637b42ed39.png

Yan-yingjie's homepage

Awareness of the past, not remonstrance, knowing the future, can be pursued

C++ programmer, 2024 electronic information graduate student


Table of contents

1. Three paradigms

2. Difference between DML statement and DDL statement

3. The difference between primary key and foreign key

4. The difference between drop, delete, and truncate

5. Infrastructure

6. What is the difference between MyISAM and InnoDB?

7. Recommend self-incrementing id as the primary key problem

8. Why MySQL's auto-increment primary key is not continuous

9. What does redo log do?

10. Timing of flushing redo log

11. How does redo log record logs?

12. What is binlog

13. Binlog record format

14. Binlog writing mechanism

15. What is the difference between redolog and binlog

16. Two-phase commit

17. What is undo log.

18. What is relaylog

19. Index

20. Hash index

21. B-tree and B+ tree

22. Primary key index

23. Secondary index

24. Clustered index and non-clustered index

25. Back to table

26. Covering index and joint index

27. The leftmost prefix matching principle

28. Index push down

29. Implicit conversion

30. How to choose ordinary index and unique index?

31. Avoid index failure

32. Rules for indexing

33. Transaction extreme characteristics

34. Problems caused by concurrent transactions

35. Transaction isolation level

36、MVCC

37. Locks in Mysql

38. Query statement execution process

39. Update statement execution process

40. SQL optimization

41. Master-slave synchronization data

42. How to solve the master-slave delay

43. Why not use long transactions

44. What is adaptive hashing?

45. What are dirty reads, phantom reads and non-repeatability?

46. ​​What are the functions of database locks and what kind of locks are there?

47. What is the relationship between isolation level and lock?

48. Lock algorithm in InnoDB?

49. Stored procedure

        1. What is a stored procedure?

        2. What is the difference between a stored procedure and a function?

50. What are the common logs in MySQL?

51. Master-slave replication

1. What is master-slave replication?

2. What is the role of master-slave replication?

3. What is the architecture of master-slave replication?

52. What is the realization principle of master-slave replication?

53. What are asynchronous replication and semi-synchronous?

54. Common problems and solutions in master-slave?


1. Three paradigms

1NF (First Normal Form): The attribute (corresponding to the field in the table) can no longer be divided, that is, this field can only be one value, and cannot be divided into multiple other fields. 1NF is the most basic requirement of all relational databases , that is to say, tables created in relational databases must satisfy the first normal form.

2NF (Second Normal Form): 2NF requires that each instance or row in the database table must be uniquely distinguishable . 2NF adds a column on the basis of 1NF. This column is called the primary key, and non-primary attributes depend on the primary key.

3NF (Third Normal Form): On the basis of 2NF, 3NF requires that each column is directly related to the primary key column, rather than indirectly related, that is, there is no non-primary key information of other tables.

During the development process, it is not necessary to satisfy the three paradigms. Sometimes, in order to improve query efficiency, fields in other tables can be redundant in the table.

2. Difference between DML statement and DDL statement

  • DML is the abbreviation of Data Manipulation Language (Data Manipulation Language), which refers to the operation of table records in the database, mainly including the insertion, update, deletion and query of table records, and is the most frequently used daily operation by developers.

  • DDL (Data Definition Language) is the abbreviation of Data Definition Language. Simply put, it is an operating language for creating, deleting, and modifying objects inside the database. The biggest difference between it and the DML language is that DML only operates on the internal data of the table, and does not involve the definition of the table, the modification of the structure, and does not involve other objects. DDL statements are more used by database administrators (DBAs), and are rarely used by general developers.

3. The difference between primary key and foreign key

  • Primary key : used to uniquely identify a row of data, there can be no repetition, no empty space is allowed, and a table can only have one primary key;

  • Foreign key : It is used to establish a relationship with other tables. The foreign key is the primary key of another table. The foreign key can have duplicates and can be null. A table can have multiple foreign keys;

4. The difference between drop, delete, and truncate

(1) different usage

  • drop(Discard data): drop table 表名, directly delete the table structure, used when deleting the table.

  • truncate(Clear data): truncate table 表名, only delete the data in the table, and when inserting data, the auto-increment id starts from 1 again, which is used when clearing the data in the table.

  • delete(delete data): delete from 表名 where 列名=值, to delete the data of a row, if no whereclause is added, truncate table 表名the effect is similar.

(2) belong to different database languages

  • truncateand dropare DDL (Data Definition Language) statements, the operation takes effect immediately, the original data is not placed in the rollback segment, cannot be rolled back, and the operation does not trigger a trigger.

  • deleteThe statement is a DML (Database Manipulation Language) statement. This operation will be placed in the rollback segment and will take effect after the transaction is committed.

(3) Execution speed is different

  • deleteWhen the command is executed, the database binloglog will be generated, and the log record is time-consuming, but there is also an advantage to facilitate data rollback and recovery.

  • truncateThe database log is not generated when the command is executed, so it is deletefaster than the command. In addition, the self-increment value of the table will be reset and the index will be restored to the original size.

  • dropThe command will release all the space occupied by the table.

In general: drop> truncate>delete

5. Infrastructure

The figure below is a brief architecture diagram of MySQL. From the figure below, you can clearly see how a SQL statement of the client is executed inside MySQL.

picture

  • Connector: related to identity authentication and authority (when logging in to MySQL).

  • Query cache: When executing a query statement, it will first query the cache (removed after MySQL 8.0, because this function is not very practical).

  • Analyzer: If the cache is not hit, the SQL statement will pass through the analyzer. To put it bluntly, the analyzer is to first look at what your SQL statement is doing, and then check whether the syntax of your SQL statement is correct.

  • Optimizer: Execute according to the optimal solution considered by MySQL.

  • Executor: Execute statements and return data from the storage engine. Before executing the statement, it will judge whether it has permission. If there is no permission, it will report an error.

  • Plug-in storage engine : It is mainly responsible for data storage and reading. It adopts a plug-in architecture and supports InnoDB, MyISAM, Memory and other storage engines.

6. What is the difference between MyISAM and InnoDB?

Before MySQL 5.5, the MyISAM engine was the default storage engine of MySQL, and after MySQL 5.5, InnoDB was the default storage engine of MySQL.

(1) Whether to support row-level locks

MyISAM only has table-level locks, while InnoDB supports row-level locks and table-level locks, and the default is row-level locks.

(2) Whether to support transactions

MyISAM does not provide transaction support, InnoDB provides transaction support, implements the four isolation levels defined by the SQL standard, and has the ability to commit and rollback transactions.

The REPEATABLE-READ (rereadable) isolation level used by InnoDB by default can solve the problem of phantom reading (based on MVCC and Next-Key Lock).

(3) Whether to support foreign keys

MyISAM does not support it, but InnoDB does.

(4) Whether to support safe recovery after abnormal database crash

MyISAM does not support it, but InnoDB does. After the database using InnoDB crashes abnormally, when the database is restarted, it will ensure that the database is restored to the state before the crash. The recovery process depends on redo log.

(5) Whether to support MVCC

MyISAM does not support it, but InnoDB does.

(6) Index implementation

Although both the MyISAM engine and the InnoDB engine use B+Tree as the index structure, the implementation methods of the two are different.

  • In the InnoDB engine, its data files are themselves index files. The table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of the tree stores complete data records.

  • MyISAM index files and data files are separated, and the index stores pointers to data files.

(7) Performance difference

The performance of InnoDB is stronger than that of MyISAM. No matter in read-write mixed mode or read-only mode, as the number of CPU cores increases, InnoDB's read and write capabilities increase linearly. Because MyISAM cannot read and write concurrently, its processing power has nothing to do with the number of cores.

picture

InnoDB and MyISAM performance comparison

7. Recommend self-incrementing id as the primary key problem

  • The B+ tree of the ordinary index stores the value of the primary key index. If the value is large, it will "result in a larger storage space for the ordinary index"

  • Use the self-incrementing id as the primary key index to insert new data as long as it is placed at the end of the page, directly "insert in order" without deliberately maintaining

  • Page splitting is easy to maintain. When the current page where data is inserted is almost full, page splitting will occur. If the primary key index is not self-increasing id, then data may be inserted from the middle of the page, and the data on the page will change frequently .

8. Why MySQL's auto-increment primary key is not continuous

  • In MySQL 5.7 and earlier versions, the self-increment value is stored in memory and not persisted;

  • Unique key conflict: When inserting data, first increase the auto-increment primary key +1, and then when inserting data, the unique key conflicts, inserting data fails, but the auto-increment primary key is not changed back;

  • Transaction rollback: Similar to the unique key conflict, the self-increment value will not be rolled back during the rollback operation. In fact, the main reason for doing this is to improve performance.

9. What does redo log do?

redo log(Redo log) is InnoDBunique to the storage engine, which enables MySQLcrash recovery.

For example, if MySQLthe instance is hung up or down, when restarting, InnoDBthe storage engine will use redo logthe restored data to ensure the persistence and integrity of the data.

When updating table data, if it is found that Buffer Poolthere is data to be updated in , it will be Buffer Poolupdated directly in . Then it will record "what modification was made on a certain data page" into the redo log cache ( redo log buffer), and then flash it to redo logthe file.

10. Timing of flushing redo log

picture

  • The red part is the redo log buffer which belongs to the memory

  • The yellow part is the page cache, which has been written to the disk at this time, but has not been persisted

  • The green part is the hard disk, which has been persisted

The InnoDB storage engine provides the innodb_flush_log_at_trx_commit parameter for the flushing strategy of the redo log, which supports three strategies

  • When it is set to 0, it means that the disk operation will not be performed every time the transaction is committed , but it will only be kept in the redo log buffer, and the mysql crash will lose 1s of data;

  • When it is set to 1, it means that every time a transaction is committed, the disk operation will be performed (default value), and it will be persisted to the disk;

  • When it is set to 2, it means that only the content of the redo log buffer is written into the page cache each time the transaction is committed , and the OS downtime will lose 1s of data because it is not persisted;

The innodb_flush_log_at_trx_commit parameter defaults to 1, which means that when the transaction is committed, fsync (synchronous operation) will be called to flush the redo log.

In addition, the InnoDB storage engine has a background thread that writes the contents of the redo log buffer to the file system cache (page cache) every 1 second, and then calls fsync to flush the disk.

When the space occupied by the redo log buffer is about to reach half of the innodb_log_buffer_size, the background thread will actively flush the disk.

11. How does redo log record logs?

redo logThere is not only one log file stored on the hard disk , but in the form of a log file group , and the size of each redolog file is the same.

For example, it can be configured as a group of 4files, and the size of each file is the content that 1GBthe entire redo loglog file group can record 4G.

It adopts the form of a ring array, writes from the beginning, writes to the end and returns to the beginning to write in a loop, as shown in the figure below.

picture

edit

Therefore, if the data is full but has not had time to actually flush the data to the disk, then the phenomenon of "memory jitter" will occur . From the perspective of the naked eye, you will find that mysql will be down for a while, and at this time it is flushing the disk.

12. What is binlog

binlog is an archive log, which belongs to the server layer log. It is a file in binary format. The record content is the original logic of the statement, which is similar to "add 1 to the c field of the line ID=2".

Regardless of the storage engine used, as long as table data updates occur, binloglogs will be generated. Its main function is data backup and master-slave replication.

binlogAll logical operations involving updating data are recorded, which belong to the logical log and are written sequentially.

13. Binlog record format

binlogThere are three formats for logs, which can binlog_formatbe specified by parameters.

  • statement : The content of the record is SQLthe original text of the statement, and there is a problem of data consistency;

  • row : record contains the specific data of the operation, which can ensure the consistency of the synchronized data;

  • mixed : The content of the record is a mixture of the former two, and it will be judged whether MySQLthis statement may cause data inconsistency: if so, use the format, otherwise use the format.SQLrowstatement

14. Binlog writing mechanism

During the execution of the transaction, the log is written first binlog cache, and when the transaction is committed, it is binlog cachewritten binlogto the file.

Because a transaction binlogcannot be disassembled, no matter how large the transaction is, it must be written at one time, so the system will allocate a block of memory to each thread binlog cache.

We can binlog_cache_sizecontrol the binlog cache size of a single thread through parameters. If the storage content exceeds this parameter, it must be temporarily stored to disk ( Swap).

binlog also provides the sync_binlog parameter to control the timing of writing to page cache and disk:

  • 0: Every time a transaction is submitted, it is only written to the page cache of the file system. The system decides when to execute it fsync. If the machine goes down, page cachethe binlog inside will be lost.

  • 1: Every time a transaction is committed fsync, it will be executed, just like the redo log log flushing process .

  • N(N>1): Every time a transaction is committed, it is written to the page cache of the file system, but Nonly after a transaction is accumulated fsync. If the machine goes down, the log Nof the most recent transaction will be lost binlog.

15. What is the difference between redolog and binlog

  • Redolog is Innodb's unique log, while binlog is at the server layer, and all storage engines are used;

  • Redolog records specific values , what modifications are made to a certain page, and the operation content recorded by binlog ;

  • When the binlog size reaches the upper limit or the flush log will generate a new file , the redolog has a fixed size and can only be recycled ;

  • The binlog log does not have the crash-safe capability and can only be used for archiving, while the redo log has the crash-safe capability;

  • The redo log can be continuously written during the execution of the transaction (the flushing is set to 1, the background thread executes once every 1s or the space occupied by the redo log buffer is about to reach half of the innodb_log_buffer_size), while the binlog is only written to the file cache system when the transaction is committed;

16. Two-phase commit

Assuming that after the redo log is written in the process of executing sql, an exception occurs during the writing of the binlog log, what will happen?

Because the binlog is abnormal before it is finished, there is no corresponding modification record in the binlog at this time. Therefore, when the binlog log is used to restore data later, this update will be omitted, and the final data will be inconsistent .

In order to solve the problem of logical consistency between two logs, the InnoDB storage engine uses a two-phase commit scheme.

The writing of the redo log is split into two steps prepare and commit, which is a two-phase commit. After using two-phase commit, it will not affect the exception when writing to the binlog, because when MySQL restores data based on the redo log log, it finds that the redo log is still in the prepare stage and there is no corresponding binlog log, so the transaction will be rolled back.

Let’s look at another scenario. An exception occurs in the redo log setting commit phase. Will the transaction be rolled back?

The transaction will not be rolled back. Although the redo log is in the prepare stage, the corresponding binlog log can be found through the transaction id, so MySQL considers it complete and will submit the transaction to restore the data.

17. What is undo log.

We know that if we want to ensure the atomicity of transactions , we need to roll back the executed operations (INSERT, DELETE, UPDATE) when an exception occurs. In MySQL, the recovery mechanism is implemented through the rollback log (undo log). All transaction modifications will be recorded in this rollback log first, and then related operations are performed.

Every time a record is changed, an undo log will be recorded, and each undo log also has an DB_ROLL_PTRattribute. These undo logs can be connected together to form a linked list to form a version chain.

The head node of the version chain is the latest value of the current record.

picture

edit

18. What is relaylog

Relaylog is a relay log, which is used during master-slave synchronization . It is an intermediary temporary log file used to store the binlog log content synchronized from the master node.

picture

edit

After the binlog of the master master node is transmitted to the slave node, it is written into the relay log, and the slave sql thread of the slave node reads the log from the relaylog and applies it locally to the slave node.

The slave server I/O thread reads the binary log of the master server and records it to the local file of the slave server, and then the SQL thread reads the content of the relay-log log and applies it to the slave server, so that the data of the slave server and the master server are consistent .

19. Index

Index is actually a data structure that can help us quickly retrieve data in the database.

The function of the index is equivalent to the table of contents of the book. For example: when we look up a dictionary, if there is no directory, then we can only find the word we need to look up page by page, and the speed is very slow. If there is a table of contents, we only need to go to the table of contents to find the position of the word, and then directly turn to that page.

20. Hash index

A hash table is a collection of key-value pairs. The corresponding value (value) can be quickly retrieved through the key (key), so the hash table can quickly retrieve data (close to O(1)).

but! The hash algorithm has a Hash conflict problem, which means that multiple different keys finally get the same index. Usually, our common solution is the chain address method .

The chain address method is to store the hash collision data in the linked list. For example, before JDK1.8, HashMap used the chain address method to resolve hash conflicts. However, after JDK1.8, HashMap introduced a red-black tree in order to reduce the search time when the linked list is too long.

In order to reduce the occurrence of Hash collisions, a good hash function should "uniformly" distribute data in the entire set of possible hash values.

Since the hash table is so fast, why doesn't MySQL use it as an index data structure? Mainly because Hash indexes do not support sequential and range queries . If we want to sort the data in the table or perform range query, then the Hash index will not work, and only one IO can be taken each time.

21. B-tree and B+ tree

  • All nodes of the B-tree store both keys and data, while only leaf nodes of the B+ tree store keys and data, and other internal nodes only store keys.

  • The leaf nodes of the B tree are all independent; the leaf nodes of the B+ tree have a reference chain pointing to its adjacent leaf nodes.

  • The retrieval process of the B-tree is equivalent to performing a binary search on the keywords of each node in the range, and the retrieval may end before reaching the leaf node. The retrieval efficiency of the B+ tree is very stable. Any search is a process from the root node to the leaf node, and the sequential retrieval of the leaf nodes is obvious.

22. Primary key index

The primary key column of the data table uses the primary key index, a special unique index.

In MySQL's InnoDB table, when there is no primary key displayed for the specified table, InnoDB will automatically check whether there is a unique index in the table and does not allow fields with null values. If so, select this field as the default primary key, otherwise InnoDB will automatically create a 6Byte auto-increment primary key.

23. Secondary index

The secondary index is also called the auxiliary index because the data stored in the leaf nodes of the secondary index is the primary key. That is to say, through the secondary index, the position of the primary key can be located.

Indexes such as unique indexes, ordinary indexes, and prefix indexes are secondary indexes.

  • Unique Index (Unique Key): A unique index is also a constraint. The value of the index column must be unique, but null values ​​are allowed; if it is a composite index , the combination of column values ​​must be unique. A table allows multiple unique indexes to be created. Most of the time, the purpose of establishing a unique index is for the uniqueness of the data in the attribute column, not for query efficiency .

  • Ordinary index (Index): The only function of an ordinary index is to quickly query data. A table allows multiple ordinary indexes to be created, and data duplication and NULL are allowed.

  • Prefix index (Prefix): The prefix index is only applicable to string type data. The prefix index is to create an index for the first few characters of the text , and the data created by the ordinary index is smaller, because only the first few characters are taken.

  • Composite index: refers to the index created on multiple fields. The index will be used only when the first field when the index is created is used in the query condition. When using a composite index, follow the leftmost prefix set (described later);

  • Full-text index (Full Text): Full-text index is mainly to retrieve keyword information in large text data, which is a technology currently used by search engine databases. Before Mysql5.6, only the MYISAM engine supported full-text indexing. After 5.6, InnoDB also supports full-text indexing.

The full-text index in MySQL has two variables, the minimum search length and the maximum search length . Words whose length is less than the minimum search length and greater than the maximum search length will not be indexed.

24. Clustered index and non-clustered index

A clustered index is an index in which the index structure and data are stored together, not a separate index type. The leaf nodes of InnoDB's primary key index store data rows, so it belongs to the clustered index.

In MySQL, the .ibd file of the InnoDB engine table contains the index and data of the table. For the InnoDB engine table, each non-leaf node of the index (B+ tree) of the table stores the index, and the leaf node stores the index and the data corresponding to the index.

A non-clustered index is an index in which the index structure and data are stored separately, not a separate index type. Secondary indexes (auxiliary indexes) are non-clustered indexes. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.

The auxiliary index is an index created by us. Its leaf nodes store the primary key. After we find the primary key through the auxiliary index, we can go back to the table to find the primary key index through the primary key we found .

25. Back to table

Back to the table is to scan out the row of the data in the index tree through the database index, get the primary key id, and then get the data in the primary key index number through the primary key id, that is, the query based on the non-primary key index needs to scan an additional index tree.

26. Covering index and joint index

If an index contains (or covers) the values ​​of all fields that need to be queried, we call it a "covering index". It means that the data we need can be queried through the index, without the need to query the data in the data table (back to the table) according to the index, which reduces the io operation of the database and improves the query efficiency.

Using multiple fields in a table to create an index is a joint index, also called a composite index or a composite index.

27. The leftmost prefix matching principle

The principle of leftmost prefix matching means that when using a joint index, MySQL will match the query conditions from left to right according to the order of the fields in the joint index. If there is a field in the query condition that matches the leftmost field in the joint index, it will use this field to filter a batch of data until all fields in the joint index are matched, or when a range query is encountered during execution, such as >, <, between, and like queries starting with %, the matching will stop.

Therefore, when we use a joint index, we can place the highly discriminative fields on the far left, which can also filter more data.

28. Index push down

Index Condition Pushdown is an index optimization function provided by MySQL version 5.6. During the non-clustered index traversal process, it can first judge the fields contained in the index, filter out unqualified records, and reduce the number of table returns.

29. Implicit conversion

When operators are used with operands of different types, type conversion occurs to make the operands compatible. Some conversions happen implicitly. For example, MySQL automatically converts strings to numbers and vice versa as needed. The following rules describe how comparison operations are transformed:

  1. When at least one of the two parameters is NULL, the comparison result is also NULL. In a special case, when using <=> to compare two NULLs, it will return 1. In both cases, no type conversion is required;

  2. Both parameters are strings, and will be compared according to strings without type conversion;

  3. Both parameters are integers, compared according to integers, without type conversion;

  4. When comparing hexadecimal values ​​with non-numbers, they are treated as binary strings;

  5. One parameter is TIMESTAMP or DATETIME, and the other parameter is a constant, which will be converted to timestamp;

  6. One parameter is of decimal type. If the other parameter is decimal or integer, the integer will be converted to decimal for comparison. If the other parameter is floating point, decimal will be converted to floating point for comparison;

  7. In all other cases, both arguments are converted to floats and compared;

30. How to choose ordinary index and unique index?

  • Inquire

    • When the normal index is a condition, the data queried will be scanned until the entire table is scanned;

    • When the unique index is the query condition, the data found will be returned directly without continuing to scan the table;

  • renew

    • Ordinary indexes will directly update the operation to the change buffer, and then end

    • A unique index needs to determine whether the data conflicts

Therefore, unique indexes are more suitable for query scenarios, and ordinary indexes are more suitable for insertion scenarios.

31. Avoid index failure

Index failure is also one of the main reasons for slow queries. The common situations that lead to index failure are as follows:

  • Query with SELECT *;

  • A composite index is created, but the query condition does not comply with the leftmost matching principle;

  • Perform operations such as calculations, functions, and type conversions on indexed columns;

  • LIKE queries starting with %, such as like '%abc';

  • If or is used in the query condition, and there is no index in a column in the pre- and post-condition of or, the involved indexes will not be used

  • The columns specified in the match() function must be exactly the same as those specified in the full-text index, otherwise an error will be reported and the full-text index cannot be used.

  • Pay attention to the search length when full-text indexing will cause the index to fail

32. Rules for indexing

  • Fields that are not NULL: The data of the index field should not be NULL as much as possible, because the database is difficult to optimize for fields whose data is NULL. If the field is frequently queried but cannot avoid being NULL, it is recommended to use short values ​​or short characters with clear semantics such as 0,1,true,false as an alternative.

  • Frequently queried fields: The fields we create indexes should be fields that are frequently queried.

  • Fields queried as conditions: Fields queried as WHERE conditions should be considered for indexing.

  • Fields that frequently need to be sorted: the index has been sorted, so that the query can use the sorting of the index to speed up the sorting query time.

  • Fields that are frequently used for connection: Fields that are often used for connection may be some foreign key columns. For foreign key columns, it is not necessary to establish a foreign key, just that the column involves the relationship between tables. For fields that are frequently queried by joins, indexing can be considered to improve the efficiency of multi-table join queries.

  • Frequently updated fields should be carefully indexed;

  • Consider building joint indexes instead of single-column indexes as much as possible;

  • Consider using prefix indexes instead of normal indexes on fields of string type;

  • Delete indexes that have not been used for a long time;

33. Transaction extreme characteristics

A thing consists of n units, and these n units either succeed at the same time or fail at the same time during execution, which puts n units in a transaction. To give a simple example: regardless of whether the test questions are correct or not, a test paper consists of multiple questions. When you answer the questions and hand them over to the teacher, you hand the whole test paper to the teacher instead of handing each question to the teacher individually. Here, the test paper can be understood as a transaction.

The characteristics of the transaction:

  • A: Atomicity ( Atomicity), atomicity means that a transaction is an indivisible unit of work, and the operations in a transaction either all occur or none occur.

  • C: Consistency ( Consistency), in a transaction, the integrity of the data before and after the transaction must be consistent.

  • I: Isolation ( Isolation), which exists in multiple transactions. The isolation of transactions means that when multiple users access the database concurrently, the transactions of one user cannot be interfered by the transactions of other users, and the data between multiple concurrent transactions must be isolated from each other.

  • D: Persistence ( Durability), persistence means that once a transaction is committed, its changes to the data in the database are permanent, and then even if the database fails, it should not have any impact on it.

34. Problems caused by concurrent transactions

  • Dirty read: Transaction B reads data that has not been committed by transaction A;

  • Lost modification (Lost to modify): When a transaction reads a data, another transaction also accesses the data, then after the data is modified in the first transaction, the second transaction also modifies the data. In this way, the modification results in the first transaction are lost, so it is called lost modification.

  • Unrepeatable read: Transaction B reads the data submitted by transaction A, that is, the data read by transaction B before and after transaction A is committed is inconsistent (transaction A and B operate on the same piece of data) 内容;

  • Phantom read/virtual read: Transaction B reads the data submitted by transaction A, that is, transaction A performs an insert operation, and the data read by transaction B before and after A transaction is 数量inconsistent.

35. Transaction isolation level

In order to solve the concurrency problems caused by the above isolation, the database provides a transaction isolation mechanism.

img

  • read uncommitted (read uncommitted): When a transaction has not been committed, the changes it makes can be seen by other transactions, reading uncommitted data, which problem can not be solved;

  • read committed (read committed): After a transaction is committed, the changes it makes will be seen by other transactions. Reading the committed data can solve dirty reads - oracle default;

  • Repeatable read (repeatable read): The data seen during the execution of a transaction is always consistent with the data seen at the start of the transaction, which can solve dirty reads and non-repeatable reads --- mysql default;

  • serializable (serialization): As the name implies, for the same row of records, "write" will add "write lock", and "read" will add "read lock". When a read-write lock conflict occurs, the later accessed transaction must wait for the previous transaction to complete before continuing to execute. Dirty reads, non-repeatable reads, and phantom reads can be solved --- equivalent to locking tables.

Although the serializable level can solve all database concurrency problems, it will lock every row of data read, which may cause a lot of timeout and lock competition problems, resulting in a decrease in efficiency. Therefore, we rarely use serializable in practical applications. Only when it is very necessary to ensure data consistency and can accept no concurrency, should we consider adopting this level.

36、MVCC

If the granularity of the lock is too large, the performance will decrease. There is a MVCC method with better performance under the InnoDB engine of MySQL.

MVCC is Multi-Version Concurremt Controlthe abbreviation of MVCC, which means a multi-version concurrency control protocol, which avoids the competition of the same data between different transactions through the version number . It is mainly to improve the concurrent read and write performance of the database, allowing multiple transactions to read and write concurrently without locking.

The implementation of MVCC relies on hidden columns, Undo log, Read View .

From the above introduction to the four isolation levels defined by the SQL standard, it can be seen that in the standard SQL isolation level definition, REPEATABLE-READ (repeatable read) cannot prevent phantom reading .

However, the REPEATABLE-READ isolation level implemented by InnoDB can actually solve the problem of phantom reading, mainly in the following two situations:

  • Snapshot read: The MVCC mechanism ensures that phantom reads do not occur.

  • Current reading: Use Next-Key Lock (proximity key lock) to lock to ensure that phantom reading does not occur. Next-Key Lock is a combination of row lock (Record Lock) and gap lock (Gap Lock). Row lock can only lock existing rows. In order to avoid inserting new rows, you need to rely on gap locks.

The InnoDB storage engine generally uses the SERIALIZABLE isolation level in the case of distributed transactions.

37. Locks in Mysql

Locks can be divided into read locks and write locks if they are divided into operation types. The concept of read-write locks mentioned here is similar to that in our Java, which can be understood as shared locks and exclusive locks. In terms of granularity, it can be divided into row locks, page locks, and table locks. Usually, we use row locks and table locks the most. What is mentioned here mainly refers to the size of the scope of the lock, and the size of the scope of the lock will also directly affect the degree of concurrency. Row locks have the highest degree of concurrency, but their locking cost is very common in the Innodb engine. Table locks have a low cost of locking, but the locking range is also large, and the degree of concurrency is the lowest. It is common in the MySIAM engine. According to the read feature—shared, MySIAM is suitable for biased query scenarios.

We know that locks and transaction levels are actually used to solve concurrency scenarios. The understanding of transaction levels can be understood with the help of redo and undo logs. So what is the relationship between them and locks? People understand that the lock mechanism is mainly for coarse-grained control, but the reading and writing of data is not successful all at once because of the existence of the storage structure, which causes the problems of dirty reads, dirty writes, non-repeatable reads, and phantom reads, and the solutions to these problems are realized with the help of the MVVC mechanism.

38. Query statement execution process

select * from tb_student  s where s.age='18' and s.name=' 张三 ';

  • First check whether the statement has permission. If there is no permission, an error message will be returned directly. If there is permission, before the MySQL8.0 version, the cache will be queried first, and this SQL statement is used as the key to query whether there is a result in memory. If there is a direct cache, if not, go to the next step.

  • Perform lexical analysis through the analyzer to extract the key elements of the SQL statement. For example, the above statement is extracted as a query select, and the name of the table to be queried is tb_student, and all columns need to be queried. The query condition is the id='1' of this table. Then judge whether the SQL statement has grammatical errors, such as whether the keywords are correct, etc. If there is no problem in the check, go to the next step.

  • The next step is for the optimizer to determine the execution plan. The above SQL statement can have two execution plans:

    • a. First query the student whose name is "Zhang San" in the student table, and then determine whether the age is 18.

    • b. First find out the students who are 18 years old among the students, and then query the students whose name is "Zhang San". Then the optimizer chooses a solution with the best execution efficiency according to its own optimization algorithm (the optimizer believes that sometimes it is not necessarily the best). Then after confirming the execution plan, it is ready to start execution.

  • Perform permission verification, if there is no permission, an error message will be returned, if there is permission, the database engine interface will be called, and the execution result of the engine will be returned.

The execution process of the query statement is as follows: permission verification (if it hits the cache) ---> query cache ---> analyzer ---> optimizer ---> permission verification ---> executor ---> engine

39. Update statement execution process

update tb_student A set A.age='19' where A.name=' 张三 ';

This statement basically follows the process of the previous query, except that it needs to record the log when executing the update, which will introduce the log module. The log module that comes with MySQL is binlog (archive log), which can be used by all storage engines. Our commonly used InnoDB engine also comes with a log module redo log (redo log). Let’s discuss the execution process of this statement in InnoDB mode.

  • First query the data of Zhang San, if there is a cache, it will also use the cache.

  • Then get the query statement and change the age to 19, and then call the engine API interface to write this row of data. The InnoDB engine saves the data in the memory and records the redo log at the same time. At this time, the redo log enters the prepare state, and then tells the executor that the execution is completed and can be submitted at any time.

  • After the executor receives the notification, it records the binlog, then calls the engine interface, and submits the redo log as the submission status.

  • update completed.

The execution flow of the update statement is as follows: analyzer---->permission verification---->executor--->engine---redo log (prepare state)---> binlog --->redo log (commit state)

40. SQL optimization

  1. Full table scans should be avoided as much as possible, and indexes should first be considered on the columns involved in where and order by;

  2. Try to avoid using the following statements in the where clause, otherwise the engine will give up using the index and perform a full table scan;

    • To judge the null value of the field,

    • Use != or <>

    • or to connect conditions (use union all instead)

    • in and not in should also be used with caution

    • Don't use fuzzy queries (full-text indexing available)

    • Reduce expression operations

    • function operation

  3. Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not used;

  4. It is best not to have more than 6 indexes in a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not frequently used;

  5. In many cases, it is a good choice to use exists instead of in;

  6. Minimize multi-table joint queries;

  7. pagination optimization;

  8. Use indexes correctly;

41. Master-slave synchronization data

picture

edit

  • master The main library writes the event type of this update to the binlog file of the main library

  • The master creates a log dump thread to notify the slave that data needs to be updated

  • The slave sends a request to the master node and saves the content of the binlog file to the local relaylog

  • The slave starts the sql thread to read the content in the relaylog, and re-executes the content locally to complete the master-slave data synchronization

Synchronization strategy :

  • Full synchronous replication : the master library forcibly synchronizes logs to the slave library, and returns to the client after all the slave libraries are executed, which has poor performance;

  • Semi-synchronous replication : The master library considers the operation successful when it receives at least one confirmation from the slave library, and the slave library writes to the log successfully and returns an ack confirmation;

42. How to solve the master-slave delay

  • After MySQL 5.6, a parallel replication method is provided, which replays by converting SQL threads into multiple worker threads

  • Improve machine configuration (kingly way)

  • Choose the appropriate sub-database and sub-table strategy at the beginning of the business to avoid the extra pressure of copying caused by the large single-form database

  • avoid long transactions

  • Avoid letting the database perform various large-scale operations

  • For some businesses that are sensitive to delay, directly use the main library to read

43. Why not use long transactions

  • In the case of concurrency, the database connection pool is easy to be burst

  • It is easy to cause a lot of blocking and lock timeout , long transactions also occupy lock resources, and may also drag down the entire library

  • Long execution time, easy to cause master-slave delay

  • The time required for rollback is relatively long , and the longer the transaction, the more transactions in the entire time period

  • The undolog log is getting bigger and bigger , and long transactions mean that there will be very old transaction views in the system. Since these transactions may access any data in the database at any time, before the transaction is committed, the rollback records it may use in the database must be kept, which will result in a large amount of storage space being occupied.

44. What is adaptive hashing?

        InnoDB will monitor the query execution of each index page on the table. If it is found that building a hash index can improve the speed, it will build a hash index. This process does not require user intervention. (enabled by default)

45. What are dirty reads, phantom reads and non-repeatability?

        Dirty read: One transaction reads data that has not been committed by another transaction. Transaction A reads the data updated by transaction B, and then B rolls back the operation, then the data read by A is dirty data. ​ Non-repeatable read: The content of the data read twice in a transaction is inconsistent. Transaction A reads the same data multiple times, and transaction B updates and submits the data during the multiple reads of transaction A, resulting in inconsistent results when transaction A reads the same data multiple times. Phantom reading: The amount of data read twice in a transaction is inconsistent. System administrator A changes the grades of all students in the database from specific scores to ABCDE grades, but system administrator B inserts a record of specific scores at this time. When system administrator A completes the modification, he finds that there is still a record that has not been changed. It is like a hallucination. This is called phantom reading.

46. ​​What are the functions of database locks and what kind of locks are there?

        When the database has concurrent transactions, data inconsistencies may occur. At this time, some mechanisms are needed to ensure the order of access. The lock mechanism is such a mechanism. That is, the role of the lock is to solve the concurrency problem.

        From the granularity of locks, locks can be divided into table locks, row locks, and page locks.

        Row-level lock: It is a kind of lock with the finest locking granularity, which means that only the row currently being operated is locked. Row-level locks can greatly reduce conflicts in database operations. Its locking granularity is the smallest, but the locking overhead is also the largest.

Row-level locks are expensive, slow to lock, and deadlocks may occur. But the locking granularity is the smallest, the probability of lock conflicts is the lowest, and the concurrency is the highest.

        Table-level lock: It is a kind of lock with the largest granularity, which means to lock the entire table currently being operated. It is simple to implement, consumes less resources, and is supported by most MySQL engines. ​ Page-level lock: It is a lock with a granularity between row-level locks and table-level locks. Table-level locks are fast, but have more conflicts, and row-level locks have fewer conflicts, but are slower. Therefore, a compromised page level is taken to lock a group of adjacent records at a time.

The overhead and locking time are between table locks and row locks, and deadlocks will occur. The locking granularity is between table locks and row locks, and the concurrency is average.

From the nature of use, it can be divided into shared locks, exclusive locks and update locks.

        Share Lock: S lock, also known as read lock, is used for all read-only data operations.

S locks are not exclusive, and multiple concurrent transactions are allowed to lock the same resource, but X locks are not allowed while adding S locks, that is, resources cannot be modified. The S lock is usually released immediately after the end of the read, without waiting for the end of the transaction.

Exclusive lock: X lock, also known as write lock, means to write data.

The X lock only allows one transaction to lock the same resource, and it will not be released until the end of the transaction. Any other transaction must wait until the X lock is released to access the page.

Use the select * from table_name for update; statement to generate an X lock.

        Update lock: U lock, which is used to schedule X locks on resources, allowing other transactions to read, but not allowing U locks or X locks to be applied.

When the read page is about to be updated, it is upgraded to an X lock, and the U lock cannot be released until the end of the transaction. Therefore, the U lock is used to avoid the deadlock phenomenon caused by the use of shared locks.

Subjectively divided, it can be divided into optimistic lock and pessimistic lock.

        Optimistic Lock: As the name suggests, it is subjectively determined that the resource will not be modified, so the data is read without locking, and only when updating, the version number mechanism is used to confirm whether the resource has been modified.

Optimistic locking is suitable for multi-read application types, which can improve the throughput of the system.

        Pessimistic Lock: As the name suggests, it has strong exclusive and exclusive characteristics. Every time data is read, it is considered to be modified by other transactions, so each operation needs to be locked.

47. What is the relationship between isolation level and lock?

        1) At the Read Uncommitted level, reading data does not require a shared lock, so that it will not conflict with the exclusive lock on the modified data;

        2) At the Read Committed level, the read operation requires a shared lock, but the shared lock is released after the statement is executed;

        3) At the Repeatable Read level, the read operation needs to add a shared lock, but the shared lock is not released before the transaction is committed, that is, the shared lock must be released after the transaction is completed;

        4) At the SERIALIZABLE level, it is the most restrictive, because this level locks the entire range of keys and holds the lock until the transaction completes.

48. Lock algorithm in InnoDB?

        Record lock: lock on a single row record Gap lock: gap lock, lock a range, excluding the record itself Next-key lock: record + gap lock a range, including the record itself

49. Stored procedure

        1. What is a stored procedure?

        A stored procedure is a precompiled SQL statement. The advantage is that it allows a modular design, that is, it only needs to be created once, and it can be called multiple times in the program later. If an operation needs to execute multiple SQLs, using stored procedures is faster than pure SQL statements.

        2. What is the difference between a stored procedure and a function?

        1) Differences in return values: Functions have one return value, while stored procedures are returned through parameters, which can have multiple or none.

        2) The difference between calls: the function can be called directly in the query statement, but the stored procedure must be called separately.

50. What are the common logs in MySQL?

        Redo log (redo log): physical log

The role is to ensure the durability of the transaction. The redo log records the state after the transaction is executed, and is used to restore the committed transaction data that has not been written to the data file.

        Rollback log (undo log): logical log

The role is to ensure the atomicity of data. It saves a version of the data before the transaction occurs, which can be used for rollback, and can also provide read under multi-version concurrency control (MVCC), that is, non-locking read.

Binary log (binlog): logical log

It is often used in master-slave synchronization or data synchronization, and can also be used for database point-in-time restoration.

Error log (errorlog)

Records MySQL start and stop, and related information about errors that occurred during the running of the server. By default, the system's error log function is turned off, and error messages are output to standard error output.

general query log

It records every command received by the server, regardless of whether the command statement is correct or not, so it will bring a lot of overhead, so it is also turned off by default.

Slow query log (slow query log)

Record the query statements that take too long to execute and that do not use indexes (default 10s), and only record the successfully executed statements.

Relay log (relay log)

Store the received binlog log content in the slave node for master-slave synchronization.

51. Master-slave replication

1. What is master-slave replication?

Master-slave replication is used to establish a database environment exactly the same as the master database, that is, the slave database. The main database is generally a quasi-real-time business database.

2. What is the role of master-slave replication?

    Read-write separation enables the database to support greater concurrency.
    High availability, hot backup of data, as a backup database, after the primary database server fails, it can switch to the secondary database to continue working to avoid data loss.

3. What is the architecture of master-slave replication?

        One master and one slave or one master and many slaves

        When the request pressure of the main library is very high, the read-write separation can be realized by configuring a one-master multi-slave replication architecture, and a large number of requests that do not require high real-time performance are distributed to multiple slave libraries through load balancing to read data, reducing the reading pressure of the main library. And when the main library is down, a slave library can be switched to the main library to continue to provide services.

        master-master replication

        The dual-master replication architecture is suitable for scenarios that require master-slave switching. The two databases are mutually master-slave. When the master database recovers from a downtime, it will still copy the data on the new master database because it is still the slave of the original slave database (now the master database). So no matter how the role of the main library is switched, the original main library will not be separated from the replication environment.

        Multiple masters and one slave (supported since 5.7)
        cascade replication

        Because each slave library has an independent Binlog Dump thread on the main library to push binlog logs, as the number of slave libraries increases, the IO pressure and network pressure of the main library will also increase. At this time, the cascaded replication architecture came into being.

        The cascaded replication architecture is only based on one master and multiple slaves, and a second-level master library Master2 is added between the master library and each slave library. This second-level master library is only used to push the Binlog logs sent to it by the first-level master library and then push them to each slave library, so as to reduce the push pressure on the first-level master library.

52. What is the realization principle of master-slave replication?

        The database has a binlog binary file, which records all SQL statements executable by the data. The goal of master-slave synchronization is to copy the SQL statements in the binlog file of the master database to the slave database, and let it execute these SQL statements again in the relaylog file of the slave data.

        The specific implementation requires three threads:

        Binlog output thread: Whenever a slave library is connected to the main library, the main library will create a thread and send the binlog content to the slave library.

        In the slave library, when the replication starts, the slave library will create two threads for processing:

        IO thread from the library: When the START SLAVE statement is executed from the library, an IO thread is created from the library, which connects to the main library and requests the main library to send the update records in the binlog to the slave library. Read the updates sent by the binlog output thread of the main library from the library IO thread and copy these updates to local files, including relaylog files. ​ From the library SQL thread: create a SQL thread from the library, this thread reads and executes the update events written from the library IO thread to the relaylog.

53. What are asynchronous replication and semi-synchronous?

        MySQL's master-slave replication has two replication methods, namely asynchronous replication and semi-synchronous replication: asynchronous replication

        MySQL's default master-slave replication method is asynchronous replication, because the Master does not consider whether the data reaches the Slave or whether the Slave is successfully executed.

        If it is necessary to achieve full synchronization, that is, the Master needs to wait for one or all Slaves to execute successfully before responding successfully, then the cluster efficiency can be imagined. Therefore, a compromise method appeared after MySQL 5.6 - semi-synchronization.

        In the case of one master and one slave, and one master and multiple slaves, the Master node can return a successful operation to the requesting client as long as it confirms that at least one Slave has received the transaction. At the same time, the Master does not need to wait for the Slave to successfully execute the transaction. The Slave node receives the transaction and writes it to the local relay log successfully.

        In addition, during semi-synchronous replication, if a transaction of the master library is successfully committed, during the process of pushing it to the slave library, the slave library is down or the network fails, resulting in the slave library not receiving the Binlog of this transaction. At this time, the master library will wait for a period of time (this time is determined by the milliseconds of rpl_semi_sync_master_timeout). If this time cannot be pushed to the slave library, MySQL will automatically switch from semi-synchronous replication to asynchronous replication. Automatically switch back to semisynchronous replication.

        The "half" of semi-synchronous replication is reflected in the fact that although the Binlog of the master-slave library is synchronized, the master library will not wait for the slave library to execute the Relay-log before returning, but confirms that the slave library has received the Binlog and returns after the master-slave Binlog synchronization is achieved. Therefore, the data of the slave library is still delayed for the master library. This delay is the time for the slave library to execute the Relay-log. So it can only be called semi-synchronous.

54. Common problems and solutions in master-slave?

        question

1) After the main database goes down, the data may be lost.

2) There is only one sql thread in the slave library, and the writing pressure of the master library is high, and the replication may be delayed.

        solve

1) Semi-synchronous replication: Ensure that the binlog is transferred to at least one slave library after the transaction is committed to solve the problem of data loss.

2) Parallel replication: multi-threaded apply binlog from the library to solve the problem of copy delay from the library.

   

book sponsor

         iToday, open a new era of information. As an innovative IT digital media platform, iToday is committed to providing users with the latest and most comprehensive IT information and content. It contains popular news from all walks of life, entertainment sharing, financial newspapers, cutting-edge technology and many other contents. Our team is composed of a group of developers who love to create and enthusiasts who share professional programming knowledge. They select and sort out authentic and credible information to ensure that you get a unique and valuable reading experience. Anytime, anywhere, all in iToday, keep connected with the world, start your new journey of information!

        IT Today's Hot List One-stop information platform IT Today's Hot List brings together various IT hot lists: Tiger Sniff, Zhihu, 36 Krypton, JD.com Book Sales, Late, All-Weather Technology, Geek Park, GitHub, Nuggets, CSDN, Bilibili, 51CTO, Blog Garden, GitChat, Developer Toutiao, Sifu, LeetCode, Everyone is a Product Manager, Niuke.com, Kanzhun, Lagou, Boss Direct Employment http://itoday.top/

Send book at the end of the article

        

book introduction

        "MySQL Beginner to Master" is a comprehensive and practical MySQL database study guide. Whether you are a beginner or an experienced developer, this book will help you master the core concepts and technologies of MySQL. From basic database design to advanced performance optimization, from SQL query statements to the application of stored procedures and triggers, this book provides a wealth of examples and practical cases to help you deeply understand all aspects of MySQL. At the same time, you will also learn how to ensure data security and reliability, and how to optimize database performance and improve application efficiency. Whether you want to learn MySQL knowledge systematically, or looking for a practical guide to solve specific problems, this book will become your indispensable study and work partner.

Guess you like

Origin blog.csdn.net/m0_73367097/article/details/131821762