Understand MySQL Metadata Lock (MDL) step by step

One day, I received a user consultation on the road. In order to clear the space, I wanted to delete the data of a large table of more than 200 GB, and it was confirmed that the table no longer had business access, so I executed a command 'delete from bigtable', but it took a long time. It was not deleted. After consultation, I learned that drop table deletes the table quickly and can completely free up space, so I executed the 'drop table bigtable' command in another session, but this command did not return the result quickly, and the cursor kept hanging. Stay still. Finally, I asked us for help. After logging into the database and executing 'show processlist', I found that the status of the drop statement is 'waiting for table metadata lock', and another delete statement executed before can still be seen, and the status is 'updating'. The screenshot is as follows:

 

What exactly is a metadata lock? How does this lock wait come about? What will be the impact? How to solve it in the end? Today we pick 6 frequently asked questions to answer for you.

 

1. What is metadata lock?

Before MySQL 5.5.3, there was a famous bug #989 , roughly as follows:

 session1:  
 BEGIN;
 INSERT INTO t ... ;
 COMMIT;  

 session2:  
 DROP TABLE t;

However, the order in which the above operation process is recorded in the binlog is

 DROP TABLE t; 
 BEGIN;  
 INSERT INTO t ... ; 
 COMMIT;

Obviously, when the standby database executes binlog, the table t will be deleted first, and then the 1032 error will be reported when insert is executed, resulting in the interruption of replication. In order to solve this bug, MySQL introduced the MDL lock (metadata lock) in 5.5.3 to protect the metadata information of the table to solve or ensure the consistency between DDL operations and DML operations.

To give another simple example, if you are querying a table and another session deletes a column from the table, what exactly should the previous query show? If the same statement is executed again in the RR isolation level, will the result be the same as before? In order to prevent this situation, MySQL will add a lock on the table at the beginning of the table query to prevent the table definition from being modified by other sessions. This lock is called 'metadata lock', or MDL for short, which is also called 'metadata lock' in Chinese '.

 

2. What is the difference between MDL and row lock?

Metadata lock is a table-level lock, which is added at the server layer and applies to all storage engines. All dml operations will add a metadata read lock on the table; all ddl operations will add a metadata write lock on the table. The blocking relationship between read locks and write locks is as follows:

  • Read locks and write locks block each other, that is, dml and ddl on the same table block each other.
  • The write lock and the write lock block each other, that is, two sessions cannot make table definition changes to the table at the same time, and serial operations are required.
  • There is no blocking between read locks and read locks. That is, adding, deleting, modifying and checking will not be blocked due to metadata lock, and can be executed concurrently. The lock waiting between dmls that you see in daily work is caused by innodb row lock, and has nothing to do with metadata lock.

 

Students who are familiar with innodb row locks may be a little confused here, because row locks are similar to metadata locks, and are mainly divided into read locks and write locks, or shared locks and exclusive locks, and the blocking relationship between read and write locks is also the same. The most important difference between the two is a table lock and a row lock, and the read and write operations in the row lock correspond to both read locks in the metadata lock.

 

You may be surprised. I heard that ordinary queries do not have locks before. Why is it said to add table locks here? Let's do a simple test:

session1: Before querying, look at the metadata_locks table. This table is located under performance_schema and records the locking information of metadata lock.

mysql> select * from performance_schema.metadata_locks ;
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE   | LOCK_DURATION | LOCK_STATUS | SOURCE            | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| TABLE       | performance_schema | metadata_locks | NULL        |       139776223308432 | SHARED_READ | TRANSACTION   | GRANTED     | sql_parse.cc:6014 |              54 |             12 |
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
1 row in set (0.00 sec)

 

session2: Execute a simple query. In order to keep the table in the execution state, the sleep function is used here.

mysql> select sleep(10) from t1;
+-----------+
| sleep(10) |
+-----------+
|         0 |
|         0 |
|         0 |
+-----------+
3 rows in set (30.00 sec)

session1:

mysql> select * from performance_schema.metadata_locks ;
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA      | OBJECT_NAME    | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE   | LOCK_DURATION | LOCK_STATUS | SOURCE            | OWNER_THREAD_ID | OWNER_EVENT_ID |
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
| TABLE       | db1                | t1             | NULL        |       139776154308336 | SHARED_READ | TRANSACTION   | GRANTED     | sql_parse.cc:6014 |              53 |             22 |
| TABLE       | performance_schema | metadata_locks | NULL        |       139776223308432 | SHARED_READ | TRANSACTION   | GRANTED     | sql_parse.cc:6014 |              54 |             13 |
+-------------+--------------------+----------------+-------------+-----------------------+-------------+---------------+-------------+-------------------+-----------------+----------------+
2 rows in set (0.00 sec)

At this time, check the metadata_lock table again and find that there is an additional lock record of t1, the lock type is SHARED_READ, and the status is GRANTED. It is generally understood that the query does not lock, which means that no innodb row lock is added to the table.

 

If another session performs a field addition operation during the execution of sleep, a metadata lock lock wait will be generated at this time:

 

session2:

mysql> select sleep(10) from t1;

In progress...

 

session3:

mysql> alter table t1 add col1 int;

Blocking...

 

session1:

mysql> show processlist;
+----+-----------------+-----------+------+---------+--------+---------------------------------+-----------------------------+
| Id | User            | Host      | db   | Command | Time   | State                           | Info                        |
+----+-----------------+-----------+------+---------+--------+---------------------------------+-----------------------------+
|  4 | event_scheduler | localhost | NULL | Daemon  | 861577 | Waiting on empty queue          | NULL                        |
| 18 | root            | localhost | db1  | Sleep   |     50 |                                 | NULL                        |
| 19 | root            | localhost | NULL | Query   |      0 | starting                        | show processlist            |
| 20 | root            | localhost | db1  | Query   |     11 | Waiting for table metadata lock | alter table t1 add col1 int |
+----+-----------------+-----------+------+---------+--------+---------------------------------+-----------------------------+
4 rows in set (0.00 sec)

Obviously, the thread with id 20 has not performed the alter operation, and the status is 'Waiting for table metadata lock', that is, waiting for the sleep operation of session2 to complete.

 

3. Why does MDL cause the system to crash?

Take a simple example:

  • session1 starts a transaction to perform a simple query on table t1;
  • session2 adds a field to t1;
  • session3 to do a query on t1;
  • session4 to do an update to t1;

Each session operates serially.

 

session1:

mysql> begin;
Query OK, 0 rows affected (0.00 sec)


mysql> select * from t1 where id=1;
+----+------+------+-------+
| id | name | age  | birth |
+----+------+------+-------+
|  1 | aa   |   10 | NULL  |
+----+------+------+-------+
1 row in set (0.00 sec)

session2:

mysql> alter table t1 add col1 int;

Blocking...

 

session3:

mysql> select sleep(10) from t1 ;

Blocking...

 

session4:

mysql> update t1 set name='aaaa' where id=2;

Blocking...

 

That is, because a transaction of session1 is not submitted, the ddl operation of session2 is blocked, session3 and session4 themselves will not be blocked by session1, but because session2 is queued earlier in the lock queue, it is going to add a metadata lock write lock, The read locks of session3 and session4 are blocked. If t1 is a frequently executed table, show processlist will find a large number of threads 'waiting for table metadata lock', and the database connection will be exhausted soon, causing the business system to fail to respond normally.

 

At this time, if session1 submits, is the alter statement of session2 executed first or session3 and session4 executed first? I always thought that the first comes first, of course, session2 is executed first, but after testing, in 5.7, session3 and session4 are executed first, and session2 is executed last, so there will be a situation where alter cannot be executed for a long time; and in 8.0, Session2 is executed first, and session3 and session4 are executed later. Since ddl is online after 5.6, session2 will not block session3 and session4. I feel that this is reasonable, and alter will not be 'starved to death'.

 

4. How long is the life cycle of MDL?

business! business! business!  The important thing is said three times, the life cycle of the metadata lock on the table starts from the first statement in the transaction involving itself, and ends at the end of the entire transaction. Before 5.5, it was statement-based, and the statement was released after the transaction was executed. If another session performed a delete field operation on the table at this time, it would cause two problems:

  • If the ddl operation is completed before the transaction, then the ddl in the binlog will be ranked before the transaction, which is obviously inconsistent with the logic and triggers the bug mentioned at the beginning of this article.
  • If it is the RR isolation level, the second execution of this table in the transaction will not return the same result, which cannot meet the requirement of repeatable read.

 

Therefore, if you want to reduce the lock waiting time of metadata lock, it is best to submit transactions in time, and try to avoid large transactions.

 

So if there is a metadata lock lock waiting, how long will the session waiting for the lock wait? Everyone knows that there is a timeout for row lock waiting in MySQL (parameter innodb_lock_wait_timeout), the default is 50s. metadata lock also has similar parameter control:

mysql> show variables like 'lock_wait_timeout'      ;
+-------------------+----------+
| Variable_name     | Value    |
+-------------------+----------+
| lock_wait_timeout | 31536000 |
+-------------------+----------+
1 row in set (0.00 sec)

With such a long number, I counted my fingers for a long time, but it really is...a year, and I have to wait after traveling around the world!!!

 

Of course, in the production environment, we rarely wait for the metadata lock to time out, and more often we need to find a way to find the source of the metadata lock, submit or roll back quickly, or find a way to kill it. So how do you find the source of the blockage?

 

5. How to quickly find the source of blockage?

Solving problems quickly is always the first priority . Once a long-term metadata lock occurs, especially on frequently accessed business tables, the table is usually inaccessible, and all reads and writes are blocked. At this time, finding the source of blocking is the first. bit.
The most important table here is the performance_schema.metadata_locks table mentioned earlier .

Metadata_locks was introduced in 5.7 and records metadata lock related information, including holding objects, types, status and other information. But 5.7 default settings are closed (8.0 default open), you need to open the settings through the following command:

UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES'WHERE NAME = 'wait/lock/metadata/sql/mdl';

 

If you want to make it permanent, you need to add the following content to the configuration file:

[mysqld]
performance-schema-instrument='wait/lock/metadata/sql/mdl=ON'

 

Simply querying this table cannot get the specific blocking relationship, and it is impossible to know what statement caused the blocking. Here we need to associate the other two tables performance_schema.thread and
performance_schema.events_statements_history. The thread table can associate the thread id with the id in show processlist. The events_statements_history table can get the historical sql of the transaction. The complete sql after the association is as follows:

SELECT
    locked_schema,
    locked_table,
    locked_type,
    waiting_processlist_id,
    waiting_age,
    waiting_query,
    waiting_state,
    blocking_processlist_id,
    blocking_age,
    substring_index(sql_text,"transaction_begin;" ,-1) AS blocking_query,
    sql_kill_blocking_connection
FROM
    (
        SELECT
            b.OWNER_THREAD_ID AS granted_thread_id,
            a.OBJECT_SCHEMA AS locked_schema,
            a.OBJECT_NAME AS locked_table,
            "Metadata Lock" AS locked_type,
            c.PROCESSLIST_ID AS waiting_processlist_id,
            c.PROCESSLIST_TIME AS waiting_age,
            c.PROCESSLIST_INFO AS waiting_query,
            c.PROCESSLIST_STATE AS waiting_state,
            d.PROCESSLIST_ID AS blocking_processlist_id,
            d.PROCESSLIST_TIME AS blocking_age,
            d.PROCESSLIST_INFO AS blocking_query,
            concat('KILL ', d.PROCESSLIST_ID) AS sql_kill_blocking_connection
        FROM
            performance_schema.metadata_locks a
        JOIN performance_schema.metadata_locks b ON a.OBJECT_SCHEMA = b.OBJECT_SCHEMA
        AND a.OBJECT_NAME = b.OBJECT_NAME
        AND a.lock_status = 'PENDING'
        AND b.lock_status = 'GRANTED'
        AND a.OWNER_THREAD_ID <> b.OWNER_THREAD_ID
        AND a.lock_type = 'EXCLUSIVE'
        JOIN performance_schema.threads c ON a.OWNER_THREAD_ID = c.THREAD_ID
        JOIN performance_schema.threads d ON b.OWNER_THREAD_ID = d.THREAD_ID
    ) t1,
    (
        SELECT
            thread_id,
            group_concat(   CASE WHEN EVENT_NAME = 'statement/sql/begin' THEN "transaction_begin" ELSE sql_text END ORDER BY event_id SEPARATOR ";" ) AS sql_text
        FROM
           performance_schema.events_statements_history
        GROUP BY thread_id
    ) t2
WHERE
    t1.granted_thread_id = t2.thread_id \G
   

 

Execute this sql for the previous example and get a clear blocking relationship:

               locked_schema: db1
                locked_table: t1
                 locked_type: Metadata Lock
      waiting_processlist_id: 28
                 waiting_age: 227
               waiting_query: alter table t1 add cl3 int
               waiting_state: Waiting for table metadata lock
     blocking_processlist_id: 27
                blocking_age: 252
              blocking_query: select * from t1
sql_kill_blocking_connection: KILL 27
1 row in set, 1 warning (0.00 sec)

 

According to the displayed results, the thread with processlist_id of 27 blocks the thread of 28, and we need to kill 27 to unlock it.

 

In fact, MySQL also provides a similar view to solve the metadata lock problem. The view name is sys.schema_table_lock_waits, but the query result of this view has bugs and is not very accurate. It is recommended that you refer to the above sql.

 

6. How was the case at the beginning of this article finally resolved?

Through the previous introduction, the process of generating the case at the beginning of this article is very simple: the user executes a full table delete, and adds a metadata read lock to the target table. Because the table is very large, the read lock cannot be released for a long time, and then another The session executes the drop table operation, and needs to add a metadata write lock to the table. Since the read and write locks block each other, the drop operation can only obtain the write lock after the delete operation is completed. Therefore, from the surface, the two commands have not been executed for a long time. Response, in fact, one is executing internally and the other is waiting.

 

So how to solve it? Because the failure mechanism can be clearly understood from the show processlist and the customer description, it was recommended that the customer kill the delete operation, and then perform the drop operation after the data is rolled back. Because the delete has been executed for a period of time, the rollback process may be longer. After the customer finally kill delete, the drop is successful.

 

summary

Most of the production environments are dml operations, and there is no lock waiting between metadata read locks. At present, most of MySQL's ddl operations can be executed online, so even if there is a write lock, it will be quickly downgraded to a read lock, so dml is blocked during ddl execution. probability is also small. The most likely situation is that due to unfinished transactions, the ddl metadata write lock cannot be added, and it can only wait in the lock queue. Once it enters the lock queue, the write lock will block other read locks, resulting in rapid growth of database connections. , until it is exhausted and eventually the business is affected.

 

To avoid similar problems as much as possible, here are a few tips:

  • For any large table or small table that is frequently operated in the production environment, DDL must be very cautious, and it is best to execute it during low business peaks.
  • In the design, large transactions should be avoided as much as possible. Large transactions will not only cause various lock problems, but also cause various problems such as replication delay/rollback space full.
  • To submit the transaction in time, it is often found that the client has set the transaction to be submitted manually, but forgot to click the submit button after SQL execution, resulting in the transaction being unable to be submitted for a long time. It is recommended to monitor long transactions in the instance to avoid transactions not being submitted in time due to various reasons.

 

Author: Zhai Zhenxing

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/5578249