"High Performance MySQL" Reading Notes (Part 1)

Table of contents

MySQL architecture

Locks in MySQL

Transactions in MySQL

transaction characteristics

isolation level

transaction log

Multi-version concurrency control MVCC

Physical Factors Affecting MySQL Performance

InnoDB buffer pool

MySQL commonly used data types and optimization

string type  

date and time type

data identifier


MySQL architecture

By default, each client connection will have a thread in the server process, and the connection's query will only be executed in this single thread, which resides on a core or CPU, and the server maintains a buffer , used to store ready threads, so there is no need to create or destroy threads for each new connection.

Locks in MySQL

The granularity of MySQL locks:

  • Row lock: high concurrency and high system overhead (for example: frequent conversion between kernel mode and user mode)

  • Table lock: Small concurrency and low system overhead.

MySQL read-write lock:

  • Read lock: Read lock is also called shared lock: it does not block when accessing shared resources concurrently. The query statement in MySQL is a read lock . Adding for share / for update after the query statement can add a write lock to the query statement.

  • Write lock: Also known as an exclusive lock, it will block when accessing shared resources concurrently. In MySQL, the addition, deletion, and modification statements all have their own write locks.

Note: Addition, deletion, and modification are implicitly locked, and the query statement needs to be explicitly locked to be locked.

Deadlock:

A deadlock is when two or more transactions hold and request locks on the same resource, resulting in a circular dependency.

INnoDB currently handles deadlocks by rolling back the transaction that holds the least row-level exclusive locks. In fact, the InnoDB storage engine will help us detect the lock situation first, and if a deadlock is detected, an error message will be returned immediately.

Once a deadlock occurs, it cannot be broken without rolling back one of the transactions.

Transactions in MySQL

transaction characteristics

MySQL transactions: (Only the INnoDB engine supports transactions in MySQL)

A transaction is a group of SQL statements, which are processed atomically as a unit of work. As a group of statements in a transaction, either all of them are successfully executed, or all of them fail to execute.

Four characteristics of transactions:

  • Atomicity: A transaction must be regarded as an indivisible unit of work, and all operations of the entire transaction are either committed successfully or rolled back on failure.

  • Consistency: It can be simply understood as conservation, and the database always transitions from one consistent state to the next consistent state. For example, if two accounts are transferred, the total amount of money in the two accounts before and after the transfer is the same.

  • Isolation: One of the most common cases of isolation is that the modifications made by a transaction are not visible to other transactions until they are finally committed.

  • Persistence: Once a transaction is successfully committed, the modifications made by the transaction will be permanently saved in the database.

isolation level

The MySQL default isolation level is repeatable read.

  • READ UNCOMMITTED (READUNCOMMITTED): In a transaction, you can view the modifications that have not been committed in other transactions. This level of isolation is generally rarely used.

  • READ COMMITTED (READ COMMITTED): A transaction can see the modifications committed by other transactions after it starts , and any modifications made by it before the transaction commits are not visible to other transactions. (Non-repeatable reads and non-repeatable reads will still occur at this level : if you execute the same SQL statement twice in the same transaction, you may see different data results )

  • Repeatable read (REPEATABLE READ): solves the non-repeatable read problem of the read committed isolation level, and ensures that the result of reading the same row data multiple times in the same transaction is the same. But it can't solve the problem of phantom reading. (Phantom reading: refers to when a transaction is reading a range of records, another transaction inserts a new record in the range, when the previous transaction reads a range of records again, A phantom is produced .)

  • Serializable (SERIALIZABLE): The previous transactions are executed sequentially, which is equivalent to locking and blocking, so it may cause a lot of problems related to timeout and lock competition. This isolation level is rarely used in actual production environments.

isolation level Whether there is a dirty read Whether non-repeatable read occurs Is there phantom reading
read uncommitted yes yes yes
read committed no yes yes
repeatable read no no yes
Serializable no no no

transaction log

Transaction logs can improve the efficiency of transactions. The storage engine only needs to change the data copy in memory , instead of modifying the table on the disk each time, and then write the changed records into the transaction. The transaction log will be persisted on the disk.

Because the transaction log uses an additional write operation, which is sequential IO in a small area of ​​the hard disk , rather than random IO , writing to the transaction log is a relatively fast operation, and finally there will be a background process in the A certain time to update the table on disk. (The operation log is written to the disk sequentially once, and then the table in the disk is updated once, and the disk will be operated twice)

Multi-version concurrency control MVCC

Multi-version concurrency control (MVCC): It can be understood as a variant of row-level locks, but mvcc avoids locking operations in many cases, so the overhead is lower.

Briefly understand the idea and some designs of mvcc: the working principle of MVCC is realized by using a snapshot of data at a certain time node;

It means that no matter how long the transaction runs, you can see a consistent view of the data; it also means that different transactions can see different data in the same table at the same time.

MySQL solves the problem of phantom reading through MVCC.

Each storage engine implements MVCC in a different way. InnoDB implements MVCC by assigning a [transaction ID] to each transaction when it starts . This ID is assigned when the transaction reads any data for the first time (the read-only transaction ID is always is 0) , and each subsequent operation of data will affect the value of this transaction ID, so the next time you read it, you can compare this transaction ID through a loop to return the corresponding view (the existence of a view is similar to a linked list, saving different time data).

Note: MVCC is only applicable to the isolation levels of [repeated read] and [read committed] transactions.

InnoDB defaults to the repeatable read isolation level, and uses the gap lock strategy to prevent phantom reading at this isolation level: InnoDB not only locks the rows involved in the query, but also locks the gaps in the index structure to prevent phantom reading. row is inserted.

Physical Factors Affecting MySQL Performance

The performance of the MySQL server is limited by the weakest link in the entire system, and the operating system and hardware hosting the MySQL server are the most important limiting factors. The most common ones are: disk space size, available memory, CPU, network, and disk material (it involves IO, so if you can use a solid-state drive, try to use a solid-state drive).

Configuring a large memory for the MySQL server is not to save a large amount of data in memory, but to reduce the number of disk IOs. Disk IO access to data is several orders of magnitude slower than direct access to data in memory.

MySQL reading, writing, and caching: If there is enough memory, the number of disk IOs can be greatly reduced, because if the data can be loaded into the memory, once the server completes the data cache warm-up, then every Each data read is a cache hit. In this case, the logical read will still be performed from the memory, but the data will not be physically read from the disk . Moreover, writing can also be performed in memory, but the data must be written to the disk for persistence in the end. That is to say, the cache can delay the write operation, but the cache cannot eliminate the disk for the write operation like the read operation IO.

In fact, the presence of a cache allows writes to be used in combination with other means in addition to deferred writes.

  • Multiple write operations, one refresh (this design can not only reduce the number of IOs, but also turn random IOs written to the disk into sequential IOs)

    • A piece of data can be changed multiple times in memory without writing new values ​​to disk each time. When the data is finally flushed to disk, all modifications that have occurred since the last physical write will be persisted.

  • IO merge

    • Many different pieces of data can be modified in memory, and these modifications can be collected together so that physical writes can be performed as a single disk operation .

Write operations can benefit from caching, which can turn random IO into sequential IO.

Memory and swap:

Swapping occurs when the operating system writes some virtual memory to disk because there is not enough memory to hold it . Write operations shorten the overall lifespan of the disk. Of course, we can also turn off swap, which can completely eliminate the negative impact of swap, but we need to bear the situation that the process is terminated due to memory exhaustion.

InnoDB buffer pool

The InnoDB buffer pool not only caches indexes, but also caches row data, adaptive hash indexes, change buffers, locks and other internal structures, etc. InnoDB also uses buffer pools to implement delayed write operations, so that multiple write operations can be combined together and execute them sequentially. InnoDB relies heavily on the buffer pool , so you should make sure to allocate enough memory for it.

Of course, a large buffer will also bring some other problems, such as longer shutdown time and warm-up time, and if there are many dirty pages in the buffer pool (modified data that has not been synchronized to disk, it is Say that the data in memory is inconsistent with the data on disk), then InnoDB may take a long time to shut down, because it will write dirty pages to the data file when it shuts down.

By default, InnoDB uses the same background thread to refresh dirty pages, and merge write operations and execute them sequentially to improve efficiency. When the percentage of dirty pages exceeds the set threshold, InnoDB will refresh pages as quickly as possible to reduce the number of dirty pages .

Transaction log: InnoDB uses logs to reduce the cost of committing transactions. It does not flush the buffer pool to disk when each transaction is committed , but records transactions into the log (using an append method to record this log, avoiding random IO), using this transaction log, InnoDB can convert random disk to sequential IO. InnoDB must eventually write the changed data to the data file, because the size of the log is fixed , and it adopts a circular writing method: when it reaches the end of the log, it will wrap around to the beginning of the log , if the log record Operations that contain changes and have not yet been applied to the data file will fail to overwrite the log record, as this would delete the only permanent record to commit the transaction.

Transaction logs are recorded using continuous disk space. In the InnoDB engine, when a transaction is committed, all transaction logs of the transaction must first be written to the redoLog File and undoLog File on the disk for persistence .

Log buffer: When InnoDB modifies data, it will write the modification record to the log buffer and save it in memory. When the buffer is full, the transaction is committed, or once per second (these three conditions are satisfied first) Standard), InnoDB will flush the buffer to the log file on disk. Compared with InnoDB's normal data, the entries of the log are very compact.

How InnoDB flushes the log buffer:

Use a mutex to lock the buffer, flush it to the desired position, how to move the remaining entries to the front of the buffer, when the mutex is released, there may be multiple transactions ready to flush their log entries, InnoDB uses A set of submission features is provided, which can submit a set of logs in a single IO operation . The log buffer must be flushed to persistent storage to ensure that committed transactions are fully durable.

InnoDB_flush_log_at_trx_commit can be used to control the flush position and refresh frequency of the log buffer :

  • 0: Write the log buffer to the log file and refresh the log file every second , but do nothing when the transaction is committed.

  • 1: Every time a transaction is committed , write the log buffer to the log file and flush it to the persistent storage. This is the default setting (and the safest setting);

  • 2: The log buffer is written to the log file each time a transaction is committed , but no refresh is performed. InnoDB flushes every second on a schedule . The main difference from setting 0 is that if only the MySQL process crashes, setting 2 will not lose any transactions, but if the entire server crashes or loses power, transactions may still be lost.

Note: On most operating systems, writing the buffer to the log just moves the data from InnoDB's in-memory buffer to the operating system's cache , while still in memory, it does not actually write the data to persistent storage , so in the event of a crash or power outage, a setting of 0 and 2 will usually result in at most a second of data loss, since the data may only exist in the operating system's cache.

MySQL commonly used data types and optimization

string type  

Variable string varchar:

  • varchar is used to store variable-length strings and is more space-efficient than fixed-length types because it uses only the necessary space.

    Varchar needs to use 1 or 2 additional bytes to record the length of the string. If the maximum length of the column is less than or equal to 255 bytes, only one byte is used to represent it, otherwise, two bytes are used to represent it. For example, varchar(10) requires 11 bytes of storage space, while varchar(1000) requires 1002 bytes of storage space.

    Note: Since the row is of variable length, it may grow when updating data, which will cause additional work. If the growth of the row makes the original location unable to accommodate more content, the specific processing behavior depends on the storage engine used. For example, InnoDB may need to split pages to accommodate rows .

It is better to use varchar in the following situations:

  • The maximum length of the string is much larger than the average length;

  • Columns are rarely updated to avoid memory fragmentation.

Fixed string char:

  • char is a fixed length, MySQL always allocates enough space for the defined string length, when storing char values, MySQL removes all trailing spaces;

char is good for storing very short strings, or when all values ​​are nearly the same length. Fixed length is not prone to memory fragmentation.

date and time type

DATETIME: This type can store a large range of values, from 1000 to 9999 years, with an accuracy of 1 microsecond. It stores dates and times packed into integers in YYYYMMDDHHMMSS format, independent of time zone . But it takes 8 bytes for storage.

TIMESTAMP (time stamp): stores the number of seconds elapsed since midnight GMT on January 1, 1970, only 4 bytes are needed, so the range is much smaller than datatime, and can only represent 1970 to 2038 January 19, the time of the timestamp depends on the time zone .

data identifier

Data identifier: In general, an identifier is a unique identifier for a data row. For example, our most common ID is the most common identifier. Identifiers may be part or all of the primary key.

After selecting the data type for the identifier, make sure that the same data type is used in all related tables, otherwise there may be problems when doing multi-table join query (the identifier should be the same as the corresponding column in the join table The data type remains the same).

  • Integer types: Integers are usually the best choice for identifiers because they are fast and auto-increment. AUTO_INCREMENT is a column attribute that automatically generates an integer value for new rows. But this type of identifier also has disadvantages: Integer types may unexpectedly run out of integers, causing server downtime, so if you choose an integer type of data as a bit identifier, you should make sure to choose an integer size that is appropriate for the expected data growth .

  • String types: String types as identifier data types should be avoided if possible, as they are very space consuming and are generally slower than integer types, especially when indexed.

Also, be especially careful with completely random strings , such as those generated by MD5(), SHA1(), or UUID(). The new values ​​generated by these functions will be arbitrarily distributed over a large space , which will slow down The speed of insert and certain types of select queries .

  • INSERT queries are slowed down because inserted values ​​are written to random locations in the index, which can lead to page splits, random disk access, and clustered index fragmentation for clustered storage engines.

  • Select queries are also slower because logically adjacent rows (referring to in-memory data) are spread widely across disk and memory.

  • For all types of queries, random values ​​can cause caches to perform poorly, because they break locality of reference, which is how caches work .

MySQL will index null, but oracle will not.

Guess you like

Origin blog.csdn.net/weixin_53142722/article/details/129208989