This article will take you through the MySQL database!


The number of words in the article is about 12,700 words, and it takes about 42 minutes to read. It is recommended to bookmark and read slowly! ! !

1. Index

  1. Why use an index

    • By creating a unique index, the uniqueness of each row of data in the database table can be guaranteed.
    • It can greatly speed up the retrieval of data, which is the main reason for creating indexes.
    • Helps the server avoid sorts and temporary tables.
    • Turn random IO into sequential IO.
    • It can speed up the connection between tables, especially in realizing the referential integrity of data.
  2. Classification of Index

    • Ordinary indexes : speed up queries only
    • Unique index : accelerated query + unique column value (can have null)
    • Primary key index : accelerated query + unique column value (cannot have null) + only one in the table
    • Composite index : multiple column values ​​form an index, which is specially used for combined search, and its efficiency is greater than index merging
    • Full-text index : segment the content of the text and search
    • Index Merge : Combine searches using multiple single-column indexes
    • Covering index : The selected data columns can only be obtained from the index without reading the data rows. In other words, the query columns must be covered by the built index
    • Clustered index : table data is stored together with the primary key, the leaf node of the primary key index stores row data (including the primary key value), and the leaf node of the secondary index stores the primary key value of the row. The B+ tree is used as the storage structure of the index. The non-leaf nodes are all index keywords, but the keywords in the non-leaf nodes do not store the specific content or content address of the corresponding record. The data on the leaf node is the primary key and specific records (data content)
  3. When do you need/don't create an index

    The biggest advantage of the index is to improve the query speed, but the index also has disadvantages, such as:

    • Need to occupy physical space, the larger the number, the larger the occupied space;
    • It takes time to create and maintain indexes, and this time increases as the amount of data increases;
    • It will reduce the efficiency of adding, deleting, and modifying tables. Every time an index is added, deleted, and modified, the B+ tree needs to be dynamically maintained in order to maintain the order of the index.

    Therefore, the index is not a master key, it is also used according to the scene.

    When should indexes be used?

    • Fields with unique restrictions, such as commodity codes;
    • Fields that are often used for WHEREquery conditions can improve the query speed of the entire table. If the query condition is not a field, a joint index can be established.
    • It is often used for GROUP BYand ORDER BYfields, so that there is no need to sort again when querying, because we all know that the records in B+Tree are all sorted after the index is built.

    When do you not need to create an index?

    • WHEREFor fields that are not used in the condition, , the value of the index is quick positioning. If the field that cannot be located is usually not needed to create an index, because the index will occupy physical space GROUP BY.ORDER BY
    • There is a lot of duplicate data in the field, and there is no need to create an index. For example, in the gender field, there are only men and women. If the records of men and women are evenly distributed in the database table, then no matter which value is searched, half of the data may be obtained. In these cases, it is better not to index, because MySQL also has a query optimizer. When the query optimizer finds that a certain value appears in a high percentage of data rows in the table, it generally ignores the index and performs a full table scan. .
    • When there is too little table data, there is no need to create an index;
    • Frequently updated fields do not need to create indexes, because the index fields are frequently modified, and the order of the B+Tree needs to be maintained, so the index needs to be rebuilt frequently, which will affect the performance of the database.
  4. Methods for Optimizing Indexes

    • Prefix index optimization;

      ​ Prefix index, as the name implies, is to use the first few characters of the string in a field to build an index.

      ​ The prefix index is used to reduce the size of the index field, which can increase the index value stored in an index page, effectively improving the query speed of the index. When some large string fields are used as indexes, using prefix indexes can help us reduce the size of index items.

    • Covering index optimization;

      ​ Covering indexes refer to all the fields of the query in SQL, those indexes that can be found on the leaf nodes of the index B+Tree, the records are queried from the secondary index, and do not need to be obtained through the clustered index query, which can avoid Back to the table operation.

      The advantage of using a covering index is that you don't need to query all the information that contains the entire row of records, which reduces a lot of I/O operations.

    • The primary key index is preferably self-incrementing;

      ​ If we use the auto-increment primary key, each new data inserted will be added to the position of the current index node in order, and there is no need to move the existing data. When the page is full, a new page will be opened automatically. Because every time a new record is inserted, it is an append operation, and there is no need to re-move the data, so this method of inserting data is very efficient.

      ​ The length of the primary key field should not be too large, because the smaller the length of the primary key field, the smaller the leaf nodes of the secondary index (the data stored in the leaf nodes of the secondary index is the primary key value), so the space occupied by the secondary index is also smaller. smaller.

    • Prevent index failure;

      ​Using an index does not mean that the index will be used when querying, so we must know in our minds what situations will cause the index to fail, so as to avoid writing query statements that fail the index, otherwise such query efficiency is very low.

      ​ In case of index failure:

      • When we use left or left fuzzy matching, that is, like %xxor like %xx%these two methods will cause the index to fail;
      • When we perform calculations, functions, and type conversion operations on index columns in query conditions, these cases will cause index failure;
      • To use the joint index correctly, it is necessary to follow the leftmost matching principle, that is, to match the index according to the leftmost first method, otherwise the index will become invalid.
      • In the WHERE clause, if the conditional column before the OR is an indexed column, but the conditional column after the OR is not an indexed column, the index will fail.
  5. Notes on using indexes

    MySQL indexes are usually used to improve the search speed when matching data rows in WHERE conditions. There are some usage details and precautions during the use of indexes.

    Functions, operations, negation operators, join conditions, multiple single-column indexes, leftmost prefix principle, range query, will not contain columns with NULL values, like statements do not use functions and operations on columns

    • 1) Do not use functions on the column, this will cause the index to fail and perform a full table scan.
    • 2) Try to avoid using negation operators such as != or not in or
    • 3) Multiple single-column indexes are not the best choice
    • 4) The leftmost prefix principle of composite index
    • 5) Benefits of covering indexes
    • 6) The impact of range queries on multi-column queries
    • 7) The index will not contain columns with NULL values
    • 8) Impact of implicit conversion
    • 9) The index invalidation problem of the like statement
  6. Why do indexes use B+ trees as indexes

    The main reason: B+ tree can realize the traversal of the whole tree as long as the leaf nodes are traversed, and range-based queries in the database are very frequent, while B-tree can only traverse all nodes in the middle order, which is too inefficient.

    The disk read and write costs of B+tree are lower, and the query efficiency of B+tree is more stable. Range-based queries are very frequent, and B-trees can only traverse all nodes in order, which is too inefficient.

    Features of B+ tree

    • All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list happen to be in order;
    • Impossible to hit at a non-leaf node;
    • A non-leaf node is equivalent to an index (sparse index) of a leaf node, and a leaf node is equivalent to a data layer for storing (keyword) data;
  7. What are index failures?

    • When we use left or left fuzzy matching, that is, like %xxor like %xx%these two methods will cause the index to fail;
    • When we use functions on index columns in query conditions, it will cause the index to fail.
    • When we perform expression calculations on index columns in query conditions, we cannot use indexes.
    • When MySQL encounters a comparison between a string and a number, it will automatically convert the string to a number, and then compare it. If the string is an index column and the input parameter in the conditional statement is a number, then the index column will undergo implicit type conversion. Since the implicit type conversion is implemented through the CAST function, it is equivalent to using a function on the index column, so will cause the index to fail.
    • To use the joint index correctly, it is necessary to follow the leftmost matching principle, that is, to match the index according to the leftmost first method, otherwise the index will become invalid.
    • In the WHERE clause, if the conditional column before the OR is an indexed column, but the conditional column after the OR is not an indexed column, the index will fail.
  8. What is the difference between MyISAM and InnoDB in implementing B-tree indexes?

    • In MyISAM, the data field of the B+Tree leaf node stores the address of the data record. When searching the index, first search the index according to the B+Tree search algorithm. If the specified key exists, take out the value of the data field, and then use The value of the data field is the address to read the corresponding data record, which is called a "non-clustered index"

    • InnoDB, its data file itself is an index file. Compared with MyISAM, the index file and data file are separated, and its table data file itself is an index structure organized by B+Tree. The node data field of the tree saves complete data records. , the key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index, which is called "clustered index" or clustered index, and the rest of the indexes are used as auxiliary indexes, and the data field of the auxiliary index stores the corresponding Record the value of the primary key instead of the address, which is also different from MyISAM.

      When searching based on the primary index, you can directly find the node where the key is located to retrieve the data; when searching based on the auxiliary index, you need to retrieve the value of the primary key first, and then go through the primary index again. Therefore, when designing tables, it is not recommended to use too long fields as the primary key, and it is not recommended to use non-monotonic fields as the primary key, which will cause frequent splitting of the primary index.

2. Affairs

  1. Four characteristics of business

    1. Atomicity: A transaction is the smallest unit of execution and cannot be split. The atomicity of transactions ensures that actions are either fully completed or have no effect at all;
    2. Consistency: Before and after a transaction is executed, the database transitions from one consistent state to another.
    3. Isolation: When accessing the database concurrently, a user's affairs will not be interfered by other transactions, and the databases between concurrent transactions are independent;
    4. Durability: After a transaction is committed. Its changes to the data in the database are persistent, even if the database fails, it should not have any impact on it.
  2. Dirty reads, non-repeatable reads, and phantom reads of transactions

    Dirty read : If a transaction "reads" another "data modified by an uncommitted transaction", it means that a "dirty read" phenomenon has occurred.

    Phantom reading : A "number of records" that meets the query conditions is queried multiple times in a transaction. If the number of records queried twice before and after is different, it means that a "phantom reading" phenomenon has occurred.

    Discard modification : two write transactions T1 and T2 increment A=0 at the same time, and the result T2 overwrites T1, resulting in the final result being 1 instead of 2, and the transaction is overwritten

    Non-repeatable read : The same data is read multiple times within a transaction. If the data read twice before and after is different, it means that the phenomenon of "non-repeatable read" has occurred.

  3. What are the transaction isolation levels?

    1. READ_UNCOMMITTED (uncommitted read): The lowest isolation level that allows reading uncommitted data changes, which may cause dirty reads, phantom reads, or non-repeatable reads;
    2. READ_COMMITTED (committed read): Allows to read data that has been committed by concurrent transactions, which can prevent dirty reads, but phantom reads or non-repeatable reads may still occur;
    3. REPEATABLE_READ (repeatable read): The results of multiple reads of the same field are consistent, unless the data is modified by its own transaction itself, which can prevent dirty reads and non-repeatable reads, but phantom reads may still occur;
    4. SERIALIZABLE (serialization): The highest isolation level, fully compliant with the ACID isolation level. All transactions are executed one by one, so that there is no possibility of interference between transactions, that is, this level can prevent dirty reads, non-repeatable reads, and phantom reads. But this will seriously affect the performance of the program. Normally this level is not used either.
    isolation level dirty read non-repeatable read phantom read
    READ-UNCOMMITTED uncommitted read
    READ-COMMITTED read committed ×
    REPEATABLE-READ repeat read × ×
    SERIALIZABLE serializable read × × ×

    The default supported isolation level of the MySQL InnoDB storage engine is REPEATABLE-READ (rereadable)

    It should be noted here that the difference from the SQL standard is that the InnoDB storage engine uses the Next-Key Lock lock algorithm under the REPEATABLE-READ (rereadable) transaction isolation level, so phantom reading can be avoided, which is different from other databases. Systems such as SQL Server are different. Therefore, the default supported isolation level of the InnoDB storage engine is REPEATABLE-READ (rereadable), which can fully guarantee the isolation requirements of transactions, that is, it has reached the SERIALIZABLE (serializable) isolation level of the SQL standard.

  4. The role of Read View

    Read View has four important fields:

    • m_ids: refers to the list of transaction ids of "active transactions" in the current database when Read View is created . Note that it is a list. "Active transactions" refer to transactions that have been started but not yet committed .
    • min_trx_id : Refers to the transaction with the smallest transaction id among the "active transactions" in the current database when Read View is created, that is, the minimum value of m_ids.
    • max_trx_id: This is not the maximum value of m_ids, but the id value that should be given to the next transaction in the current database when creating Read View , that is, the largest transaction id value in the global transaction + 1;
    • creator_trx_id: refers to the transaction id of the transaction that created the Read View .

    For database tables using the InnoDB storage engine, its clustered index records contain the following two hidden columns:

    • trx_id, when a transaction modifies a clustered index record, the transaction id of the transaction will be recorded in the trx_id hidden column ;
    • roll_pointer, every time a clustered index record is changed, the old version of the record will be written into the undo log, and then this hidden column is a pointer to each old version record , so you can use it to find record of.

    Controlling the behavior of concurrent transactions accessing the same record through the "version chain" is called MVCC (multi-version concurrency control).

  5. Does the MySQL repeatable read level completely solve phantom reading?

    • For snapshot reads (ordinary select statements), phantom reads are solved through MVCC , because under the repeatable read isolation level, the data seen during transaction execution is always consistent with the data seen when the transaction starts, even if A piece of data is inserted by other transactions in the middle, but this data cannot be queried, so it is very good to avoid the problem of phantom reading.
    • For the current read (select ... for update and other statements), the next-key lock (record lock + gap lock) is used to solve the phantom read , because when the select ... for update statement is executed, the next-key lock will be added, If another transaction inserts a record within the range of the next-key lock, the insert statement will be blocked and cannot be successfully inserted, so it is very good to avoid the problem of phantom reading.

    For transactions at the "read committed" and "repeatable read" isolation levels, they are implemented through Read View, and their difference lies in the timing of creating Read View:

    • The "read commit" isolation level is to generate a new Read View for each select, which also means that the same piece of data is read multiple times during the transaction, and the data read twice before and after may be inconsistent, because it may be different during this period. A transaction modifies the record and commits the transaction.
    • The "repeatable read" isolation level is to generate a Read View when a transaction is started, and then use this Read View during the entire transaction, so as to ensure that the data read during the transaction are all records before the transaction was started.

    The implementation of these two isolation levels is to control the behavior of concurrent transactions accessing the same record through the comparison of "fields in the Read View of the transaction" and "two hidden columns in the record". This is called MVCC (multi-version concurrency control).

    Two examples of phantom reading scenarios.

    The first example: For snapshot reads, MVCC cannot completely avoid phantom reads. Because when transaction A updates a record inserted by transaction B, the record entries of the two queries before and after transaction A are different, so phantom reading occurs.

    The second example: For the current read, if the current read is not executed after the transaction is started, but the snapshot read is performed first, and then if another transaction inserts a record during this period, then when the transaction uses the current read for subsequent queries, it will It is found that the record entries of the two queries are different, so phantom reading occurs.

    Therefore, the MySQL repeatable read isolation level does not completely solve phantom reading, but largely avoids the occurrence of phantom reading.

  6. Why is there a transaction rollback mechanism in MySQL

    In MySQL, the recovery mechanism is implemented through the rollback log (undo log). All modifications made by transactions will be recorded in the rollback log first, and then the corresponding rows in the database will be written. Once a transaction has been committed, it cannot be rolled back again.

    Functions of the rollback log: 1) It can provide rollback-related information when an error occurs or the user executes ROLLBACK 2) After the entire system crashes and the database process is directly killed, when the user starts the database process again, it can also pass immediately Query the rollback log to roll back the previous unfinished transactions, which requires that the rollback log must be persisted to the disk before the data, which is the main reason why we need to write the log first and then write the database.

3. Storage engine

  1. Introduction to InnoDB

    InnoDB is the preferred engine for transactional databases, supports transaction security tables (ACID), supports row locking and foreign keys, and InnoDB is the default MySQL engine.

    The main features of InnoDB are:

    1. InnoDB provides MySQL with a transaction-safe (ACID-compliant) storage engine with commit, rollback, and crash recovery capabilities.

      InnoDB locks at the row level and also provides an Oracle-like non-locking read in the SELECT statement. These features increase multi-user deployment and performance. In SQL queries, you can freely mix InnoDB tables with other MySQL table types, even in the same query.

    2. InnoDB is designed for maximum performance in handling huge data volumes. Its CPU efficiency is probably unmatched by any other disk-based relational database engine lock.

    3. The InnoDB storage engine is fully integrated with the MySQL server. The InnoDB storage engine maintains its own buffer pool for caching data and indexes in main memory. InnoDB organizes its tables and indexes in a logical tablespace, which can contain several files (or raw disk files). This is different from MyISAM tables, for example, where each table is stored in a separate file. InnoDB tables can be of any size, even on operating systems where file sizes are limited to 2GB.

    4. InnoDB supports foreign key integrity constraints. When storing data in a table, the storage of each table is stored in the order of the primary key. If the primary key is not specified when the table is defined, InnoDB will generate a 6-byte ROWID for each row, and Use this as the primary key.

    5. InnoDB is used on many large database sites that require high performance. InnoDB does not create a directory. When using InnoDB, MySQL will create a 10MB auto-extending data file named ibdata1 and two 5MB log files named ib_logfile0 and ib_logfile1 under the MySQL data directory.

  2. Introduction to MyISAM

    MyISAM is based on the ISAM storage engine and extends it. It is one of the most commonly used storage engines in web, data warehousing and other application environments. MyISAM has a high insertion and query speed, but does not support transactions.

    The main features of MyISAM are:

    1. Large files (up to 63-bit file length) are supported on filesystems and operating systems that support large files.
    2. Dynamically sized rows are less fragmented when deletes are mixed with update and insert operations. This is done automatically by merging adjacent deleted blocks and extending to the next block if the next block is deleted.
    3. The maximum number of indexes per MyISAM table is 64, which can be changed by recompiling. The maximum number of columns per index is 16.
    4. The maximum key length is 1000 bytes. This can also be changed by compilation. For key lengths longer than 250 bytes, a key longer than 1024 bytes will be used.
    5. BLOB and TEXT columns can be indexed.
    6. NULL is allowed in indexed columns, this value occupies 0~1 bytes per key.
    7. All numeric keys are stored big byte first to allow for a higher index compression.
    8. Each MyISAM type table has an internal column of AUTO_INCREMENT, which is updated when INSERT and UPDATE are operated, and the AUTO_INCREMENT column will be refreshed at the same time. Therefore, the AUTO_INCREMENT column update of the MyISAM type table is faster than the AUTO_INCREMENT type of the InnoDB type.
    9. Data files and index files can be placed in different directories.
    10. Each character column can have a different character set.
    11. Tables with VARCHAR can have fixed or dynamic record lengths.
    12. VARCHAR and CHAR columns can be as large as 64KB.
  3. Introduction to MEMORY

    The MEMORY storage engine stores the data in the table in memory, providing fast access without querying and referencing other table data.

    The main features of MEMORY are:

    1. MEMORY tables can have up to 32 indexes per table, 16 columns per index, and a maximum key length of 500 bytes.
    2. The MEMORY storage engine performs HASH and BTREE miniatures.
    3. It is possible to have non-unique key values ​​in a MEMORY table.
    4. MEMORY tables use a fixed record length format.
    5. MEMORY does not support BLOB or TEXT columns.
    6. MEMORY supports AUTO_INCREMENT columns and indexes on columns that can contain NULL values.
    7. MEMORY tables are shared among all clients (like any other non-TEMPORARY table).
    8. The memory of the MEMORY table is stored in the memory, and the memory is shared between the MEMORY table and the internal table created by the server during query processing.
    9. To release the memory used by the MEMORY table when the contents of the MEMORY table are no longer needed, you should execute DELETE FROM or TRUNCATE TABLE, or delete the entire table (using DROP TABLE).
  4. Archive introduction

    The application scenario of the archive storage engine is the epitome of its name, which is mainly used for archiving. The archive storage engine only supports select and insert. The most outstanding feature is fast insertion, fast query, and small footprint.

    File System Storage Features

    • Compress table data with zlib, less disk I/O (several Tinnodb tables only need a few hundred megabytes in the archive)
    • Data is stored in files with .ARZ suffix
    • .frm file

    Features

    • Only insert, replace and select are supported
    • Supports row-level locks and dedicated buffers to achieve high concurrency
    • Indexes are only allowed on self-incrementing ID columns
    • Partitioning is supported, transaction processing is not supported
  5. The difference between database engine InnoDB and MyISAM

    InnoDB

    • It is the default transactional storage engine of MySQL. Only when you need features that it does not support, consider using other storage engines.
    • Implemented four standard isolation levels, the default level is repeatable read (REPEATABLE READ). Under the repeatable read isolation level, phantom reads are prevented through multi-version concurrency control (MVCC) + gap lock (Next-Key Locking).
    • The main index is a clustered index, which saves data in the index to avoid directly reading the disk, so it greatly improves the query performance.
    • Many optimizations have been made internally, including predictive reading when reading data from disk, adaptive hash indexes that can speed up read operations and automatically create them, and insert buffers that can speed up insert operations.
    • Support real online hot backup. Other storage engines do not support online hot backup. To obtain a consistent view, you need to stop writing to all tables. In a read-write mixed scenario, stopping writing may also mean stopping reading.

    MyISAM

    • The design is simple, and the data is stored in a compact format. It can still be used for read-only data, or if the table is small enough to tolerate repair operations.
    • Provides a large number of features, including compressed tables, spatial data indexes, and more.
    • Transactions are not supported.
    • It does not support row-level locks, and can only lock the entire table. When reading, it will add a shared lock to all tables that need to be read, and when writing, it will add an exclusive lock to the table. However, while the table has read operations, new records can also be inserted into the table, which is called concurrent insert (CONCURRENT INSERT).

    Summarize

    • Transaction: InnoDB is transactional, Commitand Rollbackstatements can be used.
    • Concurrency: MyISAM only supports table-level locks, while InnoDB also supports row-level locks.
    • Foreign keys: InnoDB supports foreign keys.
    • Backup: InnoDB supports online hot backup.
    • Crash recovery: MyISAM has a much higher probability of corruption after a crash than InnoDB, and recovery is slower.
    • Other features: MyISAM supports compressed tables and spatial data indexes.

    Applicable scenarios : MyISAM is suitable for: Inserts are infrequent and queries are very frequent. If a large number of SELECTs are performed, MyISAM is a better choice without transactions. InnoDB is suitable for: Reliability requirements are relatively high, or transactions are required; table updates and queries are quite frequent, a large number of INSERT or UPDATE

4. Lock mechanism

  1. What locks does MySQL have (global locks/table-level locks/row-level locks)

    global lock

    MyISAM only supports table locks, InnoDB supports table locks and row locks, and the default is row locks.

    Table-level locks: low overhead, fast locking, and no deadlocks. The locking granularity is large, the probability of lock conflicts is the highest, and the amount of concurrency is the lowest.

    1. Table locks: table-level locks
    2. Metadata lock: The full name of MDL is metadata lock, that is, metadata lock, which is generally also called dictionary lock. The main role of MDL is to manage concurrent access to database objects and ensure metadata consistency.
    3. Intent locks: Intention locks are locks placed on one level of a resource hierarchy to protect shared or exclusive locks on lower-level resources.
    4. AUTO-INC lock: AUTO-INC lock is a special table lock mechanism. The lock is not released after another transaction is committed, but is released immediately after the insert statement is executed.

    Row-level locks: high overhead, slow locking, and deadlocks. The lock granularity is small, the probability of lock conflicts is small, and the concurrency is the highest.

    1. Record lock: a lock on a single row record;
    2. Gap lock: Gap lock, locking a range, excluding the record itself;
    3. Next-key lock: record+gap locks a range, including the record itself.
    4. Insertion intent lock: Although the name of the insertion intent lock has an intent lock, it is not an intent lock. It is a special gap lock that belongs to the row-level lock.
  2. How does MySQL lock

    Locking rules for MySQL row-level locks.

    Unique index equivalent query:

    • When the queried record "exists", after locating this record on the index tree, the next-key lock in the index of the record will degenerate into a "record lock" .
    • When the query record is "non-existent", after the index tree finds the first record larger than the query record, the next-key lock in the index of the record will degenerate into a "gap lock " .

    Non-unique index equivalent query:

    • When the queried record "exists", since it is not a unique index, there must be a record with the same index value, so the non-unique index equivalent query process is a scanning process until the first unqualified secondary is scanned The index record stops scanning, and then during the scanning process, a next-key lock is added to the scanned secondary index record, and for the first secondary index record that does not meet the conditions, the next-key lock of the secondary index Key locks degenerate into gap locks. At the same time, record locks are added to the primary key indexes of the records that meet the query conditions .
    • When the queried record "does not exist", the first unqualified secondary index record is scanned, and the next-key lock of the secondary index will degenerate into a gap lock. Because there is no record that meets the query condition, the primary key index will not be locked .

    The locking rules for range queries of non-unique indexes and primary key indexes are different in that:

    • When a unique index meets certain conditions, the next-key lock of the index degenerates into a gap lock or a record lock.
    • For non-unique index range queries, the next-key lock of the index will not degenerate into gap locks and record locks.
  3. MySQL record lock + gap lock to solve phantom reading problem

    Under MySQL's repeatable read isolation level, record locks + gap locks will be added to the index for the current read statement , which can avoid the problem of phantom reads caused by other transactions when adding, deleting, and modifying.

    One thing to note is that when executing update, delete, select ... for update and other statements with locking properties, it is necessary to check whether the statement uses the index. If it is a full table scan, a next-key will be added to each index Locking is equivalent to locking the entire table, which is a serious problem.

  4. Four necessary conditions for deadlock

    1. Mutually exclusive conditions: a resource can only be used by one process at a time;

    2. Request and holding conditions: When a process is blocked due to requesting resources, it will hold on to the obtained resources;

    3. Non-deprivation condition: the resources obtained by the process cannot be forcibly deprived before they are used up;

    4. Circular waiting condition: a number of processes form a head-to-tail circular waiting resource relationship;

  5. How to solve MySQL deadlock problem

    A deadlock is a phenomenon in which two or more transactions occupy each other on the same resource and request to lock each other's resources, resulting in a vicious circle.

    Common ways to solve deadlocks

    • If different programs access multiple tables concurrently, try to agree to access the tables in the same order , which can greatly reduce the chance of deadlock;
    • In the same transaction, try to lock all the resources needed at one time to reduce the probability of deadlock;
    • For business parts that are very prone to deadlocks, you can try to use upgrade locking granularity to reduce the probability of deadlocks through table- level locking.
  6. Principles and application scenarios of database pessimistic locking and optimistic locking

    Pessimistic locks first acquire locks and then perform business operations. Generally, statements such as SELECT ... FOR UPDATE are used to lock data to prevent other transactions from accidentally modifying data. When the database executes SELECT ... FOR UPDATE, it will acquire the row lock of the data row in the select, and the row lock acquired by select for update will be released automatically at the end of the current transaction, so it must be used in the transaction.

    Optimistic locking, first perform business operations, and only check whether the data has been updated when the data is actually updated at the end. The AtomicFieldUpdater in the Java concurrent package is similar. It also uses the CAS mechanism and does not lock the data. Instead, it compares the timestamp or version number of the data to realize the version judgment required by optimistic locking.

5. Other knowledge points of MySQL

  1. Which two parts can the internal structure of MySQL generally be divided into?

    It can be divided into two parts: the service layer and the storage engine layer, among which:

    The service layer includes connectors, query caches, analyzers, optimizers, executors, etc. , covering most of MySQL's core service functions, as well as all built-in functions (such as date, time, math and encryption functions, etc.), all across storage engines All functions are implemented at this layer, such as stored procedures, triggers, views, etc.

    The storage engine layer is responsible for data storage and retrieval . Its architecture mode is plug-in, and supports multiple storage engines such as InnoDB, MyISAM, and Memory. The most commonly used storage engine is InnoDB, which has become the default storage engine since MySQL 5.5.5.

  2. What are the uses of undo log, redo log, and binlog

    The redo log is unique to the InnoDB engine and only records the modification records of the tables in the engine. Binlog is implemented by the server layer of MySQL, and will record all engine modifications to the database.

    The redo log is a physical log, which records what changes have been made on a specific data page; the binlog is a logical log, which records the original logic of this statement.

    The redo log is written cyclically, and the space will always be used up; the binlog can be written additionally. After the binlog file is written to a certain size, it will switch to the next one and will not overwrite the previous log.

    Replenish

    1. Redolog records modification content (which page has changed), which is written before the transaction starts, and is used for data recovery after the database hangs before the data is not dropped. 2. Binlog records modify SQL, which is written when the transaction is committed
    . Can be used for read-write separation
    3, undolog records records before modification, used for rollback and multi-version concurrency control

  3. What is Buffer pool

    The Innodb storage engine designs a buffer pool (*Buffer Pool*) to improve the read and write performance of the database.

    • When reading data, if the data exists in the Buffer Pool, the client will directly read the data in the Buffer Pool, otherwise it will read it from the disk.
    • When modifying data, first modify the page where the data in the Buffer Pool is located, then set its page as a dirty page, and finally write the dirty page to disk by a background thread.

    what to cache

    InnoDB will divide the stored data into several "pages", and use the page as the basic unit of disk and memory interaction. The default size of a page is 16KB. Therefore, Buffer Pool also needs to be divided by "page".

    When MySQL starts, InnoDB will apply for a continuous memory space for Buffer Pool, and then 16KBdivide pages one by one according to the default size. The pages in Buffer Pool are called cache pages . At this time, these cache pages are free, and then as the program runs, pages on the disk will be cached in the Buffer Pool.

    Innodb manages slow pages through three linked lists:

    • Free List (free page linked list), manage free pages;
    • Flush List (dirty page linked list), manage dirty pages;
    • LRU List, which manages dirty pages + clean pages, caches recently and frequently queried data in it, and eliminates infrequently queried data. ;

    InnoDB has made some optimizations to LRU. The LRU algorithm we are familiar with usually puts the most recently queried data at the head of the LRU linked list, while InnoDB does 2 optimizations:

    • Divide the LRU linked list into two areas , young and old. The pages added to the buffer pool are inserted into the old area first; when the page is accessed, it enters the young area. The purpose is to solve the problem of read-ahead failure.
    • When **"page is accessed" and "old area residence time exceeds innodb_old_blocks_timethe threshold (default is 1 second)"**, the page will be inserted into the young area, otherwise it will still be inserted into the old area, the purpose is to solve batch data access , the problem of eliminating a large amount of hot data.

    You can set the proportion of the young area and the old area by adjusting innodb_old_blocks_pctthe parameter .

  4. The difference between DROP, DELETE and TRUNCATE

    All three can represent deletion, and the subtle differences are as follows:

    DROP DELETE TRUNCATE
    SQL statement type DDL DML DDL
    rollback no rollback can be rolled back no rollback
    delete content Delete the table from the database, all data rows, indexes and permissions will also be deleted The table structure is still there, delete all or part of the data rows in the table The table structure is still there, delete all the data in the table
    delete speed fastest delete The deletion is slow and needs to be deleted line by line delete fast

    Therefore, use DROP when you no longer need a table; use DELETE when you want to delete some data rows; use TRUNCATE when you want to delete all data while keeping the table.

  5. What is the difference between inner join, self join, outer join (left, right, full) and cross join in SQL syntax?

    1. Inner join: Only the two element tables that match can be displayed in the result set.

    2. Outer join:

    • Left outer join: The left side is the driving table, all the data in the driving table will be displayed, and the unmatched data in the matching table will not be displayed.

    • Right outer join: the right side is the driving table, all the data in the driving table will be displayed, and the unmatched data in the matching table will not be displayed.

    • Full outer join: All unmatched data in the joined tables will be displayed.

    1. Cross connection: Cartesian effect, the displayed result is the product of the number of linked tables.
  6. What are the differences between CHAR and VARCHAR in MySQL

    • The length of char is immutable and filled with spaces to the specified length, while the length of varchar is variable.
    • The access rate of char is still much faster than that of varchar
    • The storage method of char is: one byte is occupied for English characters (ASCII), and two bytes are occupied for one Chinese character. The storage method of varchar is: each English character occupies 2 bytes, and Chinese characters also occupy 2 bytes
  7. What are the primary keys, super keys, candidate keys, and foreign keys in the database

    • Superkey : The attribute set that can uniquely identify a tuple in a relationship is called a superkey of a relational schema
    • Candidate key : A super key that does not contain redundant attributes is called a candidate key. That is, in the candidate key, if you delete the attribute, it is not a key!
    • primary key : a candidate key program primary key selected by the user as a tuple identifier
    • Foreign key : If attribute K in relational schema R is the primary key of other schemas , then k is called a foreign key in schema R.

    The primary key is a subset of the candidate key, the candidate key is a subset of the super key, and the determination of the foreign key is relative to the primary key.

  8. MySQL optimization

    • Create an index for the search field
    • Avoid using Select *, list the fields that need to be queried
    • vertical split table
    • Choose the right storage engine
  9. SQL statement execution process

    The steps for the server layer to execute sql in sequence are:

    1. client request ->
    2. Connector (authenticate user, grant permissions) ->
    3. Query the cache (return directly if there is a cache, and perform subsequent operations if it does not exist) ->
    4. Analyzer (lexical analysis and syntax analysis operations on SQL) ->
    5. Optimizer (mainly select the optimal execution plan method for the executed sql optimization) ->
    6. Executor (when executing, it will first check whether the user has execution permission, and then use the interface provided by this engine) ->
    7. Go to the engine layer to get the data back (if the query cache is enabled, the query results will be cached)

    Brief summary:

    • Connector : management connection, authority verification;
    • Query cache : if the cache is hit, the result will be returned directly;
    • Analyzer : perform lexical analysis and syntax analysis on SQL; (judging whether the query SQL field exists is also in this step)
    • Optimizer : execution plan generation, index selection;
    • Executor : operate the engine and return the result;
    • Storage engine : store data and provide read and write interfaces.
  10. What are the three paradigms of database

    1. The first paradigm: the emphasis is on the atomicity of the column, that is, each column in the database table is an indivisible atomic data item;
    2. Second normal form: The attributes of the entity are required to be completely dependent on the primary key. The so-called complete dependence means that there cannot be attributes that only depend on a part of the primary key;
    3. Third Normal Form: Any non-key attribute does not depend on other non-key attributes.
  11. Understanding of MVCC

    Database concurrency scenario:

    1. Read-read: there is no problem and no need for concurrency control;
    2. Read-write: There are thread safety issues, which may cause transaction isolation issues, and may encounter dirty reads, phantom reads, and non-repeatable reads;
    3. Write-Write: There are thread safety issues, and there may be a problem of lost updates.

    Multi-version concurrency control (MVCC) is a lock-free concurrency control used to resolve read-write conflicts, that is, assign a one-way increasing timestamp to a transaction, save a version for each modification, and the version is associated with the transaction timestamp. Read operations only read the snapshot of the database before the transaction started.

    MVCC can solve the following problems for databases:

    1. When reading and writing the database concurrently, it is possible to avoid blocking the writing operation during the reading operation, and the writing operation does not need to block the reading operation, which improves the performance of the concurrent reading and writing of the database;
    2. At the same time, it can also solve transaction isolation problems such as dirty reads, phantom reads, and non-repeatable reads, but it cannot solve the problem of lost updates.
  12. Which three threads are involved in master-slave replication?

    Three threads are mainly involved: binlog thread, I/O thread and SQL thread.

    1. binlog thread: Responsible for writing data changes on the master server to the binary log (Binary log).
    2. I/O thread: responsible for reading the binary log from the master server and writing it to the slave server's replay log (Relay log).
    3. SQL thread: responsible for reading the replay log and replaying the SQL statements in it.
  13. How the database guarantees persistence

    Mainly use the redo log of Innodb . Rewrite the log, as mentioned before, MySQL first loads the data on the disk into the memory, modifies the data in the memory, and then writes it back to the disk. If there is a sudden shutdown at this time, the data in the memory will be lost. how to solve this problem? Simple, just write the data to the disk directly before the transaction is committed. What's wrong with doing this?

    • To modify only one byte in a page, the entire page must be flushed to disk, which is a waste of resources. After all, a page is 16kb in size, and if you only change a little bit of it, you have to flash the 16kb content to the disk, which sounds unreasonable.
    • After all, the SQL in a transaction may involve the modification of multiple data pages, and these data pages may not be adjacent, that is, they belong to random IO. Obviously, the operation of random IO will be slower.

    Therefore, it was decided to use redo log to solve the above problems. When modifying data, it not only operates in memory, but also records the operation in the redo log . When the transaction is committed, the redo log will be flushed ( part of the redo log is in memory and part is on disk). When the database is down and restarted, the content in the redo log will be restored to the database, and then the data will be rolled back or submitted according to the contents of the undo log and binlog .

    The benefits of using redo log?

    In fact, the advantage is that flushing the redo log is more efficient than flushing the data pages. The specific performance is as follows:

    • The redo log is small in size. After all, it only records which page has been modified and what, so it is small in size and quick to flash.
    • The redo log is always appended to the end, which belongs to sequential IO. Efficiency is obviously faster than random IO.
  14. How does the database guarantee atomicity

    Mainly use the undo log of Innodb . The undo log is called the rollback log, which is the key to atomicity. When the transaction is rolled back, all the SQL statements that have been successfully executed can be undone. It needs to record the corresponding log information that you want to roll back. For example

    • When you delete a piece of data, you need to record the information of this data, and when you roll back, insert this old data
    • When you update a piece of data, you need to record the previous old value. When rolling back, perform the update operation based on the old value
    • When inserting a piece of data, the primary key of this record is needed. When rolling back, delete operation is performed according to the primary key

    The undo log records the information required for these rollbacks. When the transaction execution fails or rollback is called, causing the transaction to be rolled back, the information in the undo log can be used to roll back the data to the state before the modification.

  15. How does the database ensure consistency

    • From the database level , the database ensures consistency through atomicity, isolation, and persistence. That is to say, among the four characteristics of ACID, C (consistency) is the purpose, A (atomicity), I (isolation), and D (persistence) are means, which are the means provided by the database to ensure consistency. The database must implement the three characteristics of AID to achieve consistency . For example, atomicity cannot be guaranteed, and obviously consistency cannot be guaranteed either.
    • From the application level , judge whether the database data is valid through the code, and then decide whether to roll back or submit the data!
  16. Solutions for high database concurrency

    • Add caching to the web service framework. A cache layer is added between the server and the database layer to store frequently accessed data in the cache and reduce the load on the database.
    • Increase the database index, thereby improving the query speed. (However, too many indexes will slow down the speed, and the writing of the database will cause the update of the index, which will also slow down the speed)
    • Master-slave read and write separation, let the master server be responsible for writing, and the slave server is responsible for reading.
    • Split the database to make the database table as small as possible to improve the query speed.
    • Use a distributed architecture to disperse computing pressure.
  17. The means of database structure optimization

    • Paradigm optimization : such as eliminating redundancy (saving space...)
    • Anti-paradigm optimization : such as adding redundancy appropriately (reduce join)
    • Limit the range of data : Be sure to prohibit the query statement without any conditions to limit the range of data. For example: when users query the order history, we can control it within a month.
    • Read/write separation : classic database splitting scheme, the main library is responsible for writing, and the slave library is responsible for reading;
    • Split table : Partitions physically separate data, and data in different partitions can be specified and saved in data files on different disks. In this way, when the table is queried, it is only necessary to scan the table partition instead of the full table scan, which significantly shortens the query time. In addition, the partitions on different disks will also disperse the data transmission of this table in different disks. For disk I/O, a well-designed partition can evenly distribute data transfer to disk I/O competition. This method can be adopted for timetables with a large amount of data. Table partitions can be automatically created on a monthly basis.
  18. The difference between relational and non-relational databases

    Non-relational databases are also called NOSQL, which are stored in the form of key-value pairs.

    It has high read and write performance and is easy to expand. It can be divided into memory database and document database, such as Redis, Mongodb, HBase and so on.

    Scenarios for using non-relational databases:

    • Log system, geographic location storage, huge data volume, high availability
    1. Advantages of relational databases
      • easy to understand. Because it uses a relational model to organize data.
      • Data consistency can be maintained.
      • Data update overhead is relatively small.
      • Support for complex queries (queries with where clauses)
    2. Advantages of non-relational databases
      • It does not need to be parsed by the SQL layer, and the reading and writing efficiency is high.
      • Based on key-value pairs, data scalability is very good.
      • Can support the storage of various types of data, such as pictures, documents, etc.
  19. Why should the database be divided into databases and tables

    The purpose of sub-database and sub-table is to reduce the burden of single database and single table of the database, improve query performance, and shorten query time.

    By dividing tables , the burden of a single table in the database can be reduced, and the pressure can be distributed to different tables. At the same time, because the amount of data on different tables is small, it can improve query performance and shorten query time. In addition, it can be greatly to alleviate the problem of table locks. Table splitting strategies can be summarized as vertical splitting and horizontal splitting: horizontal splitting : modulo splitting belongs to random splitting, while time dimension splitting belongs to continuous splitting. How to design a vertical split, my suggestion: Separate the fields that are not commonly used into another extended table. Separate the fields with large text into another extended table, and put the fields that are not frequently modified in the same In one table, put frequently changing fields in another table. For a large number of user scenarios, you can consider taking modules and sub-tables. The data is relatively uniform, and hot spots and concurrent access bottlenecks are not easy to occur.

    The sub-table in the library only solves the problem of too large data in a single table, but it does not distribute the data in a single table to different physical machines, so it cannot reduce the pressure on the MySQL server, and there is still resource competition on the same physical machine And bottlenecks, including CPU, memory, disk IO, network bandwidth, etc.

    Distributed dilemmas and countermeasures brought about by sub-databases and sub-tables Data migration and expansion issues - the general approach is to read the data first through the program, and then write the data to each sub-table according to the specified sub-table strategy middle. Paging and sorting issues----It is necessary to sort and return the data in different sub-tables, summarize and re-sort the result sets returned by different sub-tables, and finally return them to the user.

Guess you like

Origin blog.csdn.net/weixin_53795646/article/details/129435371