Summary of MYSQL common interview questions

  5a2585dded9b416fb4ea58637b42ed39.png

  Yan-yingjie's homepage

Awareness of the past, not remonstrance, knowing the future, can be pursued  

C++ programmer, 2024 electronic information graduate student


Table of contents

1. Three paradigms

2. Difference between DML statement and DDL statement

3. The difference between primary key and foreign key

4. The difference between drop, delete, and truncate

5. Infrastructure

6. What is the difference between MyISAM and InnoDB?

7. Recommend self-incrementing id as the primary key problem

8. Why MySQL's auto-increment primary key is not continuous

9. What does redo log do?

10. Timing of flushing redo log

11. How does redo log record logs?

12. What is binlog

13. Binlog record format

14. Binlog writing mechanism

15. What is the difference between redolog and binlog

16. Two-phase commit

17. What is undo log.

18. What is relaylog

19. Index

20. Hash index

21. B-tree and B+ tree

22. Primary key index

23. Secondary index

24. Clustered index and non-clustered index

25. Back to table

26. Covering index and joint index

27. The leftmost prefix matching principle

28. Index push down

29. Implicit conversion

30. How to choose ordinary index and unique index?

31. Avoid index failure

32. Rules for indexing

33. Transaction extreme characteristics

34. Problems caused by concurrent transactions

35. Transaction isolation level

36、MVCC

37. Locks in Mysql

38. Query statement execution process

39. Update statement execution process

40. SQL optimization

41. Master-slave synchronization data

42. How to solve the master-slave delay

43. Why not use long transactions


1. Three paradigms

        1NF (First Normal Form): The attribute (corresponding to the field in the table) can no longer be divided, that is, this field can only be one value, and cannot be divided into multiple other fields. 1NF is the most basic requirement of all relational databases , that is to say, tables created in relational databases must satisfy the first normal form.

        2NF (Second Normal Form): 2NF requires that each instance or row in the database table must be uniquely distinguishable . 2NF adds a column on the basis of 1NF. This column is called the primary key, and non-primary attributes depend on the primary key.

        3NF (Third Normal Form): On the basis of 2NF, 3NF requires that each column is directly related to the primary key column, rather than indirectly related, that is, there is no non-primary key information of other tables.

        During the development process, it is not necessary to satisfy the three paradigms. Sometimes, in order to improve query efficiency, fields in other tables can be redundant in the table.

2. Difference between DML statement and DDL statement

  • DML is the abbreviation of Data Manipulation Language (Data Manipulation Language), which refers to the operation of table records in the database, mainly including the insertion, update, deletion and query of table records, and is the most frequently used daily operation by developers.

  • DDL (Data Definition Language) is the abbreviation of Data Definition Language. Simply put, it is an operating language for creating, deleting, and modifying objects inside the database. The biggest difference between it and the DML language is that DML only operates on the internal data of the table, and does not involve the definition of the table, the modification of the structure, and does not involve other objects. DDL statements are more used by database administrators (DBAs), and are rarely used by general developers.

3. The difference between primary key and foreign key

  • Primary key : used to uniquely identify a row of data, there can be no repetition, no empty space is allowed, and a table can only have one primary key;

  • Foreign key : It is used to establish a relationship with other tables. The foreign key is the primary key of another table. The foreign key can have duplicates and can be null. A table can have multiple foreign keys;

4. The difference between drop, delete, and truncate

(1) different usage

  • drop(Discard data): drop table 表名, directly delete the table structure, used when deleting the table.

  • truncate(Clear data): truncate table 表名, only delete the data in the table, and when inserting data, the auto-increment id starts from 1 again, which is used when clearing the data in the table.

  • delete(delete data): delete from 表名 where 列名=值, to delete the data of a row, if no whereclause is added, truncate table 表名the effect is similar.

(2) belong to different database languages

  • truncateand dropare DDL (Data Definition Language) statements, the operation takes effect immediately, the original data is not placed in the rollback segment, cannot be rolled back, and the operation does not trigger a trigger.

  • deleteThe statement is a DML (Database Manipulation Language) statement. This operation will be placed in the rollback segment and will take effect after the transaction is committed.

(3) Execution speed is different

  • deleteWhen the command is executed, the database binloglog will be generated, and the log record is time-consuming, but there is also an advantage to facilitate data rollback and recovery.

  • truncateThe database log is not generated when the command is executed, so it is deletefaster than the command. In addition, the self-increment value of the table will be reset and the index will be restored to the original size.

  • dropThe command will release all the space occupied by the table.

In general: drop> truncate>delete

5. Infrastructure

The figure below is a brief architecture diagram of MySQL. From the figure below, you can clearly see how a SQL statement of the client is executed inside MySQL.

picture

img

  • Connector: related to identity authentication and authority (when logging in to MySQL).

  • Query cache: When executing a query statement, it will first query the cache (removed after MySQL 8.0, because this function is not very practical).

  • Analyzer: If the cache is not hit, the SQL statement will pass through the analyzer. To put it bluntly, the analyzer is to first look at what your SQL statement is doing, and then check whether the syntax of your SQL statement is correct.

  • Optimizer: Execute according to the optimal solution considered by MySQL.

  • Executor: Execute statements and return data from the storage engine. Before executing the statement, it will judge whether it has permission. If there is no permission, it will report an error.

  • Plug-in storage engine : It is mainly responsible for data storage and reading. It adopts a plug-in architecture and supports InnoDB, MyISAM, Memory and other storage engines.

6. What is the difference between MyISAM and InnoDB?

Before MySQL 5.5, the MyISAM engine was the default storage engine of MySQL, and after MySQL 5.5, InnoDB was the default storage engine of MySQL.

(1) Whether to support row-level locks

MyISAM only has table-level locks, while InnoDB supports row-level locks and table-level locks, and the default is row-level locks.

(2) Whether to support transactions

MyISAM does not provide transaction support, InnoDB provides transaction support, implements the four isolation levels defined by the SQL standard, and has the ability to commit and rollback transactions.

The REPEATABLE-READ (rereadable) isolation level used by InnoDB by default can solve the problem of phantom reading (based on MVCC and Next-Key Lock).

(3) Whether to support foreign keys

MyISAM does not support it, but InnoDB does.

(4) Whether to support safe recovery after abnormal database crash

MyISAM does not support it, but InnoDB does. After the database using InnoDB crashes abnormally, when the database is restarted, it will ensure that the database is restored to the state before the crash. The recovery process depends on redo log.

(5) Whether to support MVCC

MyISAM does not support it, but InnoDB does.

(6) Index implementation

Although both the MyISAM engine and the InnoDB engine use B+Tree as the index structure, the implementation methods of the two are different.

  • In the InnoDB engine, its data files are themselves index files. The table data file itself is an index structure organized by B+Tree, and the data field of the leaf node of the tree stores complete data records.

  • MyISAM index files and data files are separated, and the index stores pointers to data files.

(7) Performance difference

The performance of InnoDB is stronger than that of MyISAM. No matter in read-write mixed mode or read-only mode, as the number of CPU cores increases, InnoDB's read and write capabilities increase linearly. Because MyISAM cannot read and write concurrently, its processing power has nothing to do with the number of cores.

picture

InnoDB and MyISAM performance comparison

7. Recommend self-incrementing id as the primary key problem

  • The B+ tree of the ordinary index stores the value of the primary key index. If the value is large, it will "result in a larger storage space for the ordinary index"

  • Use the self-incrementing id as the primary key index to insert new data as long as it is placed at the end of the page, directly "insert in order" without deliberately maintaining

  • Page splitting is easy to maintain. When the current page where data is inserted is almost full, page splitting will occur. If the primary key index is not self-increasing id, then data may be inserted from the middle of the page, and the data on the page will change frequently. " Leading to higher maintenance costs for page splits”

8. Why MySQL's auto-increment primary key is not continuous

  • In MySQL 5.7 and earlier versions, the self-increment value is stored in memory and not persisted;

  • Unique key conflict: When inserting data, first increase the auto-increment primary key +1, and then when inserting data, the unique key conflicts, inserting data fails, but the auto-increment primary key is not changed back;

  • Transaction rollback: Similar to the unique key conflict, the self-increment value will not be rolled back during the rollback operation. In fact, the main reason for doing this is to improve performance.

9. What does redo log do?

redo log(Redo log) is InnoDBunique to the storage engine, which enables MySQLcrash recovery.

For example, if MySQLthe instance is hung up or down, when restarting, InnoDBthe storage engine will use redo logthe restored data to ensure the persistence and integrity of the data.

When updating table data, if it is found that Buffer Poolthere is data to be updated in , it will be Buffer Poolupdated directly in . Then it will record "what modification was made on a certain data page" into the redo log cache ( redo log buffer), and then flash it to redo logthe file.

10. Timing of flushing redo log

picture

  • The red part is the redo log buffer which belongs to the memory

  • The yellow part is the page cache, which has been written to the disk at this time, but has not been persisted

  • The green part is the hard disk, which has been persisted

The InnoDB storage engine provides the innodb_flush_log_at_trx_commit parameter for the flushing strategy of the redo log, which supports three strategies

  • When it is set to 0, it means that the disk operation will not be performed every time the transaction is committed , but it will only be kept in the redo log buffer, and the mysql crash will lose 1s of data;

  • When it is set to 1, it means that every time a transaction is committed, the disk operation will be performed (default value), and it will be persisted to the disk;

  • When it is set to 2, it means that only the content of the redo log buffer is written into the page cache each time the transaction is committed , and the OS downtime will lose 1s of data because it is not persisted;

The innodb_flush_log_at_trx_commit parameter defaults to 1, which means that when the transaction is committed, fsync (synchronous operation) will be called to flush the redo log.

In addition, the InnoDB storage engine has a background thread that writes the contents of the redo log buffer to the file system cache (page cache) every 1 second, and then calls fsync to flush the disk.

When the space occupied by the redo log buffer is about to reach half of the innodb_log_buffer_size, the background thread will actively flush the disk.

11. How does redo log record logs?

redo logThere is not only one log file stored on the hard disk , but in the form of a log file group , and the size of each redolog file is the same.

For example, it can be configured as a group of 4files, and the size of each file is the content that 1GBthe entire redo loglog file group can record 4G.

It adopts the form of a ring array, writes from the beginning, writes to the end and returns to the beginning to write in a loop, as shown in the figure below.

picture

Therefore, if the data is full but has not had time to actually flush the data to the disk, then the phenomenon of "memory jitter" will occur . From the perspective of the naked eye, it will be found that mysql will be down for a while, and at this time it is flushing the disk. up.

12. What is binlog

binlog is an archive log, which belongs to the server layer log. It is a file in binary format. The record content is the original logic of the statement, which is similar to "add 1 to the c field of the line ID=2".

Regardless of the storage engine used, as long as table data updates occur, binloglogs will be generated. Its main function is data backup and master-slave replication.

binlogAll logical operations involving updating data are recorded, which belong to the logical log and are written sequentially.

13. Binlog record format

binlogThere are three formats for logs, which can binlog_formatbe specified by parameters.

  • statement : The content of the record is SQLthe original text of the statement, and there is a problem of data consistency;

  • row : record contains the specific data of the operation, which can ensure the consistency of the synchronized data;

  • mixed : The content of the record is a mixture of the former two, and it will be judged whether MySQLthis statement may cause data inconsistency: if so, use the format, otherwise use the format.SQLrowstatement

14. Binlog writing mechanism

During the execution of the transaction, the log is written first binlog cache, and when the transaction is committed, it is binlog cachewritten binlogto the file.

Because a transaction binlogcannot be disassembled, no matter how large the transaction is, it must be written at one time, so the system will allocate a block of memory to each thread binlog cache.

We can binlog_cache_sizecontrol the binlog cache size of a single thread through parameters. If the storage content exceeds this parameter, it must be temporarily stored to disk ( Swap).

binlog also provides the sync_binlog parameter to control the timing of writing to page cache and disk:

  • 0: Every time a transaction is submitted, it is only written to the page cache of the file system. The system decides when to execute it fsync. If the machine goes down, page cachethe binlog inside will be lost.

  • 1: Every time a transaction is committed fsync, it will be executed, just like the redo log log flushing process .

  • N(N>1): Every time a transaction is committed, it is written to the page cache of the file system, but Nonly after a transaction is accumulated fsync. If the machine goes down, the log Nof the most recent transaction will be lost binlog.

15. What is the difference between redolog and binlog

  • Redolog is Innodb's unique log, while binlog is at the server layer, and all storage engines are used;

  • Redolog records specific values , what modifications are made to a certain page, and the operation content recorded by binlog ;

  • When the binlog size reaches the upper limit or the flush log will generate a new file , the redolog has a fixed size and can only be recycled ;

  • The binlog log does not have the crash-safe capability and can only be used for archiving, while the redo log has the crash-safe capability;

  • The redo log can be written continuously during the execution of the transaction (the flushing is set to 1, the background thread executes once every 1s or the space occupied by the redo log buffer is about to reach half of the innodb_log_buffer_size), while the binlog is only written to the file cache when the transaction is committed system;

16. Two-phase commit

        Assuming that after the redo log is written in the process of executing sql, an exception occurs during the writing of the binlog log, what will happen?

        Because the binlog is abnormal before it is finished, there is no corresponding modification record in the binlog at this time. Therefore, when the binlog log is used to restore data later, this update will be omitted, and the final data will be inconsistent .

In order to solve the problem of logical consistency between two logs, the InnoDB storage engine uses a two-phase commit scheme.

        The writing of the redo log is split into two steps prepare and commit, which is a two-phase commit. After using two-phase commit, it will not affect the exception when writing to the binlog, because when MySQL restores data based on the redo log log, it finds that the redo log is still in the prepare stage and there is no corresponding binlog log, so the transaction will be rolled back.

        Let’s look at another scenario. An exception occurs in the redo log setting commit phase. Will the transaction be rolled back?

        The transaction will not be rolled back. Although the redo log is in the prepare stage, the corresponding binlog log can be found through the transaction id, so MySQL considers it complete and will submit the transaction to restore the data.

17. What is undo log.

        We know that if we want to ensure the atomicity of transactions , we need to roll back the executed operations (INSERT, DELETE, UPDATE) when an exception occurs. In MySQL, the recovery mechanism is implemented by rolling back the log (undo log) Yes, all modifications made by transactions will be recorded in this rollback log first, and then related operations will be performed.

        Every time a record is changed, an undo log will be recorded, and each undo log also has an DB_ROLL_PTRattribute. These undo logs can be connected together to form a linked list to form a version chain.

        The head node of the version chain is the latest value of the current record.

picture

18. What is relaylog

Relaylog is a relay log, which is used during master-slave synchronization . It is an intermediary temporary log file used to store the binlog log content synchronized from the master node.

picture

        After the binlog of the master master node is transmitted to the slave node, it is written into the relay log, and the slave sql thread of the slave node reads the log from the relaylog and applies it locally to the slave node.

        The slave server I/O thread reads the binary log of the master server and records it to the local file of the slave server, and then the SQL thread reads the content of the relay-log log and applies it to the slave server, so that the data of the slave server and the master server remain unanimous .

19. Index

        Index is actually a data structure that can help us quickly retrieve data in the database.

        The function of the index is equivalent to the table of contents of the book. For example: when we look up a dictionary, if there is no directory, then we can only find the word we need to look up page by page, and the speed is very slow. If there is a table of contents, we only need to go to the table of contents to find the position of the word, and then directly turn to that page.

20. Hash index

        A hash table is a collection of key-value pairs. The corresponding value (value) can be quickly retrieved through the key (key), so the hash table can quickly retrieve data (close to O(1)).

        but! The hash algorithm has a Hash conflict problem, which means that multiple different keys finally get the same index. Usually, our common solution is the chain address method .

        The chain address method is to store the hash collision data in the linked list. For example, before JDK1.8, HashMap used the chain address method to resolve hash conflicts. However, after JDK1.8, HashMap introduced a red-black tree in order to reduce the search time when the linked list is too long.

        In order to reduce the occurrence of Hash collisions, a good hash function should "uniformly" distribute data in the entire set of possible hash values.

        Since the hash table is so fast, why doesn't MySQL use it as an index data structure? Mainly because Hash indexes do not support sequential and range queries . If we want to sort the data in the table or perform range query, then the Hash index will not work, and only one IO can be taken each time.

21. B-tree and B+ tree

  • All nodes of the B-tree store both keys and data, while only leaf nodes of the B+ tree store keys and data, and other internal nodes only store keys.

  • The leaf nodes of the B tree are all independent; the leaf nodes of the B+ tree have a reference chain pointing to its adjacent leaf nodes.

  • The retrieval process of the B-tree is equivalent to performing a binary search on the keywords of each node in the range, and the retrieval may end before reaching the leaf node. The retrieval efficiency of the B+ tree is very stable. Any search is a process from the root node to the leaf node, and the sequential retrieval of the leaf nodes is obvious.

22. Primary key index

The primary key column of the data table uses the primary key index, a special unique index.

In MySQL's InnoDB table, when there is no displayed primary key of the specified table, InnoDB will automatically check whether there is a unique index in the table and does not allow fields with null values. If so, select this field as the default primary key. Otherwise, InnoDB will automatically create a 6Byte auto-increment primary key.

23. Secondary index

        The secondary index is also called the auxiliary index because the data stored in the leaf nodes of the secondary index is the primary key. That is to say, through the secondary index, the position of the primary key can be located.

Indexes such as unique indexes, ordinary indexes, and prefix indexes are secondary indexes.

  • Unique Index (Unique Key): A unique index is also a constraint. The value of the index column must be unique, but null values ​​are allowed; if it is a composite index , the combination of column values ​​must be unique. A table allows multiple unique indexes to be created. Most of the time, the purpose of establishing a unique index is for the uniqueness of the data in the attribute column, not for query efficiency .

  • Ordinary index (Index): The only function of an ordinary index is to quickly query data. A table allows multiple ordinary indexes to be created, and data duplication and NULL are allowed.

  • Prefix index (Prefix): The prefix index is only applicable to string type data. The prefix index is to create an index for the first few characters of the text , and the data created by the ordinary index is smaller, because only the first few characters are taken.

  • Composite index: refers to the index created on multiple fields. The index will be used only when the first field when the index is created is used in the query condition. When using a composite index, follow the leftmost prefix set (described later);

  • Full-text index (Full Text): Full-text index is mainly to retrieve keyword information in large text data, which is a technology currently used by search engine databases. Before Mysql5.6, only the MYISAM engine supported full-text indexing. After 5.6, InnoDB also supports full-text indexing.

The full-text index in MySQL has two variables, the minimum search length and the maximum search length . Words whose length is less than the minimum search length and greater than the maximum search length will not be indexed.

24. Clustered index and non-clustered index

        A clustered index is an index in which the index structure and data are stored together, not a separate index type. The leaf nodes of InnoDB's primary key index store data rows, so it belongs to the clustered index.

        In MySQL, the .ibd file of the InnoDB engine table contains the index and data of the table. For the InnoDB engine table, each non-leaf node of the index (B+ tree) of the table stores the index, and the leaf node stores the index The data corresponding to the index.

        A non-clustered index is an index in which the index structure and data are stored separately, not a separate index type. Secondary indexes (auxiliary indexes) are non-clustered indexes. MySQL's MyISAM engine, regardless of primary key or non-primary key, uses non-clustered indexes.

        The auxiliary index is an index created by us. Its leaf nodes store the primary key. After we find the primary key through the auxiliary index, we can go back to the table to find the primary key index through the primary key we found .

25. Back to table

        Back to the table is to scan out the row of the data in the index tree through the database index, get the primary key id, and then get the data in the primary key index number through the primary key id, that is, the query based on the non-primary key index needs to scan an additional index tree.

26. Covering index and joint index

        If an index contains (or covers) the values ​​of all fields that need to be queried, we call it a "covering index". It means that the data we need can be queried through the index, without the need to query the data in the data table (back to the table) according to the index, which reduces the io operation of the database and improves the query efficiency.

Using multiple fields in a table to create an index is a joint index, also called a composite index or a composite index.

27. The leftmost prefix matching principle

The principle of leftmost prefix matching means that when using a joint index, MySQL will match the query conditions from left to right according to the order of the fields in the joint index. If there is a matching field, it will use this field to filter a batch of data until all the fields in the joint index are matched, or a range query is encountered during execution, such as >, <, between, and like queries starting with %. will stop matching.

Therefore, when we use a joint index, we can place the highly discriminative fields on the far left, which can also filter more data.

28. Index push down

Index Condition Pushdown (Index Condition Pushdown) is an index optimization function provided by MySQL version 5.6. During the non-clustered index traversal process, it can first judge the fields contained in the index, filter out unqualified records, and reduce Return times.

29. Implicit conversion

When operators are used with operands of different types, type conversion occurs to make the operands compatible. Some conversions happen implicitly. For example, MySQL automatically converts strings to numbers and vice versa as needed. The following rules describe how comparison operations are transformed:

  1. When at least one of the two parameters is NULL, the comparison result is also NULL. In a special case, when using <=> to compare two NULLs, it will return 1. In both cases, no type conversion is required;

  2. Both parameters are strings, and will be compared according to strings without type conversion;

  3. Both parameters are integers, compared according to integers, without type conversion;

  4. When comparing hexadecimal values ​​with non-numbers, they are treated as binary strings;

  5. One parameter is TIMESTAMP or DATETIME, and the other parameter is a constant, which will be converted to timestamp;

  6. One parameter is of decimal type. If the other parameter is decimal or integer, the integer will be converted to decimal for comparison. If the other parameter is floating point, decimal will be converted to floating point for comparison;

  7. In all other cases, both arguments are converted to floats and compared;

30. How to choose ordinary index and unique index?

  • Inquire

    • When the normal index is a condition, the data queried will be scanned until the entire table is scanned;

    • When the unique index is the query condition, the data found will be returned directly without continuing to scan the table;

  • renew

    • Ordinary indexes will directly update the operation to the change buffer, and then end

    • A unique index needs to determine whether the data conflicts

Therefore, unique indexes are more suitable for query scenarios, and ordinary indexes are more suitable for insertion scenarios.

31. Avoid index failure

Index failure is also one of the main reasons for slow queries. The common situations that lead to index failure are as follows:

  • Query with SELECT *;

  • A composite index is created, but the query condition does not comply with the leftmost matching principle;

  • Perform operations such as calculations, functions, and type conversions on indexed columns;

  • LIKE queries starting with %, such as like '%abc';

  • If or is used in the query condition, and there is no index in a column in the pre- and post-condition of or, the involved indexes will not be used

  • The columns specified in the match() function must be exactly the same as those specified in the full-text index, otherwise an error will be reported and the full-text index cannot be used.

  • Pay attention to the search length when full-text indexing will cause the index to fail

32. Rules for indexing

  • Fields that are not NULL: The data of the index field should not be NULL as much as possible, because the database is difficult to optimize for fields whose data is NULL. If the field is frequently queried but cannot avoid being NULL, it is recommended to use short values ​​or short characters with clear semantics such as 0,1,true,false as an alternative.

  • Frequently queried fields: The fields we create indexes should be fields that are frequently queried.

  • Fields queried as conditions: Fields queried as WHERE conditions should be considered for indexing.

  • Fields that frequently need to be sorted: the index has been sorted, so that the query can use the sorting of the index to speed up the sorting query time.

  • Fields that are frequently used for connection: Fields that are often used for connection may be some foreign key columns. For foreign key columns, it is not necessary to establish a foreign key, just that the column involves the relationship between tables. For fields that are frequently queried by joins, indexing can be considered to improve the efficiency of multi-table join queries.

  • Frequently updated fields should be carefully indexed;

  • Consider building joint indexes instead of single-column indexes as much as possible;

  • Consider using prefix indexes instead of normal indexes on fields of string type;

  • Delete indexes that have not been used for a long time;

33. Transaction extreme characteristics

A thing consists of n units, and these n units either succeed at the same time or fail at the same time during execution, which puts n units in a transaction. Let me give you a simple example: without considering whether the test questions are correct or not, a test paper consists of multiple questions. The questions are handed over to the teacher separately, and the test paper can be understood as a transaction here.

The characteristics of the transaction:

  • A: Atomicity ( Atomicity), atomicity means that a transaction is an indivisible unit of work, and the operations in a transaction either all occur or none occur.

  • C: Consistency ( Consistency), in a transaction, the integrity of the data before and after the transaction must be consistent.

  • I: Isolation ( Isolation), which exists in multiple transactions. The isolation of transactions means that when multiple users access the database concurrently, the transactions of one user cannot be interfered by the transactions of other users, and the data between multiple concurrent transactions must be mutually isolation.

  • D: Persistence ( Durability), persistence means that once a transaction is committed, its changes to the data in the database are permanent, and then even if the database fails, it should not have any impact on it.

34. Problems caused by concurrent transactions

  • Dirty read: Transaction B reads data that has not been committed by transaction A;

  • Lost modification (Lost to modify): When a transaction reads a data, another transaction also accesses the data, then after the data is modified in the first transaction, the second transaction also modifies the data. In this way, the modification results in the first transaction are lost, so it is called lost modification.

  • Unrepeatable read: Transaction B reads the data submitted by transaction A, that is, the data read by transaction B before and after transaction A is committed is inconsistent (transaction A and B operate on the same piece of data) 内容;

  • Phantom read/virtual read: Transaction B reads the data submitted by transaction A, that is, transaction A performs an insert operation, and the data read by transaction B before and after A transaction is 数量inconsistent.

35. Transaction isolation level

In order to solve the concurrency problems caused by the above isolation, the database provides a transaction isolation mechanism.

  • read uncommitted (read uncommitted): When a transaction has not been committed, the changes it makes can be seen by other transactions, reading uncommitted data, which problem can not be solved;

  • read committed (read committed): After a transaction is committed, the changes it makes will be seen by other transactions. Reading the committed data can solve dirty reads - oracle default;

  • Repeatable read (repeatable read): The data seen during the execution of a transaction is always consistent with the data seen at the start of the transaction, which can solve dirty reads and non-repeatable reads --- mysql default;

  • serializable (serialization): As the name implies, for the same row of records, "write" will add "write lock", and "read" will add "read lock". When a read-write lock conflict occurs, the later accessed transaction must wait for the previous transaction to complete before continuing to execute. Dirty reads, non-repeatable reads, and phantom reads can be solved --- equivalent to locking tables.

Although the serializable level can solve all database concurrency problems, it will lock every row of data read, which may cause a lot of timeout and lock competition problems, resulting in a decrease in efficiency. Therefore, we rarely use serializable in practical applications. Only when it is very necessary to ensure data consistency and can accept no concurrency, should we consider adopting this level.

36、MVCC

        If the granularity of the lock is too large, the performance will decrease. There is a MVCC method with better performance under the InnoDB engine of MySQL.

        MVCC is Multi-Version Concurremt Controlthe abbreviation of MVCC, which means a multi-version concurrency control protocol, which avoids the competition of the same data between different transactions through the version number . It is mainly to improve the concurrent read and write performance of the database, allowing multiple transactions to read and write concurrently without locking.

        The implementation of MVCC relies on hidden columns, Undo log, Read View .

        From the above introduction to the four isolation levels defined by the SQL standard, it can be seen that in the standard SQL isolation level definition, REPEATABLE-READ (repeatable read) cannot prevent phantom reading .

        However, the REPEATABLE-READ isolation level implemented by InnoDB can actually solve the problem of phantom reading, mainly in the following two situations:

  • Snapshot read: The MVCC mechanism ensures that phantom reads do not occur.

  • Current read: Use Next-Key Lock (proximity key lock) to lock to ensure that phantom reading does not occur. Next-Key Lock is a combination of row lock (Record Lock) and gap lock (Gap Lock). Row lock can only lock Existing rows, in order to avoid inserting new rows, need to rely on gap locks.

The InnoDB storage engine generally uses the SERIALIZABLE isolation level in the case of distributed transactions.

37. Locks in Mysql

        Locks can be divided into read locks and write locks if they are divided into operation types. The concept of read-write locks mentioned here is similar to that in our Java, which can be understood as shared locks and exclusive locks. In terms of granularity, it can be divided into row locks, page locks, and table locks. Usually, we use row locks and table locks the most. What is mentioned here mainly refers to the size of the scope of the lock, and the size of the scope of the lock. It also directly affects the degree of concurrency. The concurrency degree of row lock is the highest, but its locking cost is very common in the Innodb engine. The locking cost of table lock is low, but the locking range is also large, and the concurrency is the lowest. It is common in the MySIAM engine. According to the read Features - Shared, MySIAM is suitable for biased query scenarios.

        We know that locks and transaction levels are actually used to solve concurrency scenarios. The understanding of transaction levels can be understood with the help of redo and undo logs. So what is the relationship between them and locks? People understand that the lock mechanism is mainly a coarse-grained control, but the reading and writing of data is not successful all at once because of the existence of the storage structure, which results in those dirty reads, dirty writes, and non-repeatable The problems of reading and phantom reading, and the solutions to these problems are realized by means of the MVVC mechanism.

38. Query statement execution process

select * from tb_student  s where s.age='18' and s.name=' 张三 ';
  • First check whether the statement has permission. If there is no permission, an error message will be returned directly. If there is permission, before the MySQL8.0 version, it will first query the cache, and use this SQL statement as the key to query whether there is a result in the memory. If yes Directly cache, if not, go to the next step.

  • Perform lexical analysis through the analyzer to extract the key elements of the SQL statement. For example, the above statement is extracted as a query select, and the name of the table to be queried is tb_student, and all columns need to be queried. The query condition is the id='1' of this table. Then judge whether the SQL statement has grammatical errors, such as whether the keywords are correct, etc. If there is no problem in the check, go to the next step.

  • The next step is for the optimizer to determine the execution plan. The above SQL statement can have two execution plans:

    • a. First query the student whose name is "Zhang San" in the student table, and then determine whether the age is 18.

    • b. First find out the students who are 18 years old among the students, and then query the students whose name is "Zhang San". Then the optimizer chooses a solution with the best execution efficiency according to its own optimization algorithm (the optimizer believes that sometimes it is not necessarily the best). Then after confirming the execution plan, it is ready to start execution.

  • Perform permission verification, if there is no permission, an error message will be returned, if there is permission, the database engine interface will be called, and the execution result of the engine will be returned.

The execution process of the query statement is as follows: permission verification (if it hits the cache) ---> query cache ---> analyzer ---> optimizer ---> permission verification ---> executor ---> engine

39. Update statement execution process

update tb_student A set A.age='19' where A.name=' 张三 ';

This statement will basically follow the flow of the previous query, except that it needs to record the log when executing the update, which will introduce the log module. The log module that comes with MySQL is binlog (archive log), and all storage All engines can be used. Our commonly used InnoDB engine also comes with a log module redo log (redo log). We will discuss the execution process of this statement in InnoDB mode.

  • First query the data of Zhang San, if there is a cache, it will also use the cache.

  • Then get the query statement and change the age to 19, and then call the engine API interface to write this line of data. The InnoDB engine saves the data in memory and records the redo log at the same time. At this time, the redo log enters the prepare state, and then tells the executor to complete the execution can be submitted at any time.

  • After the executor receives the notification, it records the binlog, then calls the engine interface, and submits the redo log as the submission status.

  • update completed.

The execution flow of the update statement is as follows: analyzer---->permission verification---->executor--->engine---redo log(prepare status)--->binlog--->redo log(commit state)

40. SQL optimization

  1. Full table scans should be avoided as much as possible, and indexes should first be considered on the columns involved in where and order by;

  2. Try to avoid using the following statements in the where clause, otherwise the engine will give up using the index and perform a full table scan;

    • To judge the null value of the field,

    • Use != or <>

    • or to connect conditions (use union all instead)

    • in and not in should also be used with caution

    • Don't use fuzzy queries (full-text indexing available)

    • Reduce expression operations

    • function operation

  3. Do not use select * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not used;

  4. It is best not to have more than 6 indexes in a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not frequently used;

  5. In many cases, it is a good choice to use exists instead of in;

  6. Minimize multi-table joint queries;

  7. pagination optimization;

  8. Use indexes correctly;

41. Master-slave synchronization data

picture

  • master The main library writes the event type of this update to the binlog file of the main library

  • The master creates a log dump thread to notify the slave that data needs to be updated

  • The slave sends a request to the master node and saves the content of the binlog file to the local relaylog

  • The slave starts the sql thread to read the content in the relaylog, and re-executes the content locally to complete the master-slave data synchronization

Synchronization strategy :

  • Full synchronous replication : the master library forcibly synchronizes logs to the slave library, and returns to the client after all the slave libraries are executed, which has poor performance;

  • Semi-synchronous replication : The master library considers the operation successful when it receives at least one confirmation from the slave library, and the slave library writes to the log successfully and returns an ack confirmation;

42. How to solve the master-slave delay

  • After MySQL 5.6, a parallel replication method is provided, which replays by converting SQL threads into multiple worker threads

  • Improve machine configuration (kingly way)

  • Choose the appropriate sub-database and sub-table strategy at the beginning of the business to avoid the extra pressure of copying caused by the large single-form database

  • avoid long transactions

  • Avoid letting the database perform various large-scale operations

  • For some businesses that are sensitive to delay, directly use the main library to read

43. Why not use long transactions

  • In the case of concurrency, the database connection pool is easy to be burst

  • It is easy to cause a lot of blocking and lock timeout , long transactions also occupy lock resources, and may also drag down the entire library

  • Long execution time, easy to cause master-slave delay

  • The time required for rollback is relatively long , and the longer the transaction, the more transactions in the entire time period

  • The undolog log is getting bigger and bigger , and long transactions mean that there will be very old transaction views in the system. Since these transactions may access any data in the database at any time, before the transaction is committed, the rollback records it may use in the database must be kept, which will result in a large amount of storage space being occupied.

Guess you like

Origin blog.csdn.net/m0_73367097/article/details/131716495