Exploring the underlying architecture of MySQL: an overview of the design and implementation process

Likes are still required, in case there is a handsome guy in front of the screen, just like it! ! ! !
insert image description here
Author: Mr. Raymon in Source Code Times

say up front

Mysql, as an excellent and widely used database management system, is almost an indispensable part of daily development for many Java engineers. Whether it is storing massive data, or efficiently retrieving and managing data, Mysql plays an important role. However, in addition to using Mysql for daily development, do we really understand its underlying architecture and the process of design and implementation? This blog will take you in-depth exploration of the design and implementation process of Mysql's underlying architecture, helping you better understand and apply this powerful database system. Let us uncover the mystery of the bottom layer of Mysql together and explore its mysteries.

1. What does Mysql look like in your eyes?

MySQL, in the eyes of most ordinary Java engineers, is often seen as a tool for storing and manipulating data. We often use it to create databases, create tables and indexes, in order to add, delete, modify and query data. These basic usage methods have become routine operations when dealing with MySQL in our daily work. (Like the picture below)insert image description here

However, in daily development, we often only focus on how to correctly use MySQL for data operations, and rarely have a deep understanding of the underlying architecture and implementation principles of MySQL. We may know little about underlying mechanisms such as storage engines, query optimizers, and transaction management, and have limited knowledge of how to optimize performance, ensure data consistency, and backup and recovery.
Because of this, it is very important for us to understand the underlying MySQL architecture design and implementation process. It can not only help us understand the internal mechanism of MySQL more fully, but also improve our work efficiency and quality. In the following content, we will discuss in depth the various components and technologies of MySQL's underlying architecture, hoping to bring you a deeper and more comprehensive knowledge of MySQL. Let us unveil the underlying veil of MySQL and explore its mysteries

2. How does the Java system connect to Mysql?

In Java, connecting to a MySQL database usually requires JDBC (Java Database Connectivity). JDBC is a set of APIs provided by Java for accessing databases. It provides a standard interface that allows us to interact with various databases through Java code.

To connect to the MySQL database, you first need to ensure that the MySQL database has been installed in the system, and the appropriate MySQL JDBC driver has been imported into the Java project. The Mysql driver builds a bridge between the Java system and the Msyql database for us:
insert image description here

Therefore, when we are implementing business code, if we need to execute related SQL statements, the Mysql driver can help us pass the SQL statements to the Mysql database for execution: Then let's think about a question, can a Java system only follow
insert image description here
the Does the database establish a connection? This is definitely not possible, because we need to understand a truth. Suppose we develop a web system in Java and deploy it in Tomcat, then Tomcat itself must have multiple threads to process multiple requests concurrently. Let's look at the picture below:
insert image description here
Therefore, when there are multiple business requests, we can establish a database connection for each request for separate use, as follows: But insert image description here
in a high-concurrency scenario, if each Tomcat thread accesses the database. Is it possible to connect to a database, execute a SQL statement, and then destroy the connection? There may be hundreds of threads performing this process frequently. This approach is not advisable. It takes time to establish a database connection each time. When the connection is established and the SQL statement is executed, the connection is destroyed and the connection is re-established. This is very inefficient.

Therefore, we need to introduce the concept of connection pool to solve this problem. The connection pool maintains a set of reusable database connections and manages the connections efficiently. When the Tomcat thread needs to access the database, it can obtain an available connection from the connection pool, and return the connection to the connection pool after execution. This can reduce the frequent creation and destruction of connections and improve performance. As follows:
insert image description here

3. Why does Mysql also need a connection pool?

You know when you go to the bank to do business, sometimes you have to wait in line? It would be a waste of time and resources to assume that everyone needs to wait for the bank staff to do the business for them, right? The MySQL connection pool is like a queuing system for bank transactions, which helps us manage and utilize database connections more effectively.
insert image description here

  1. Improve connection efficiency: In MySQL, some preparatory work is required to establish a database connection, just like bank staff need to make some preparations before handling business. If the connection is re-created every time, it will be very inefficient, just like everyone has to go to the bank to queue up to get a number and handle business. The connection pool will create some connections in advance, just like the bank prepares several windows in advance for business processing, so that only one available connection can be obtained from the connection pool, which reduces waiting time and improves connection efficiency.

  2. Save system resources: database connection is a limited resource, just like the staff of a bank is limited. If everyone uses a staff member to handle business, the bank will quickly be paralyzed. The connection pool can manage and control the number of connections, similar to the number of bank control windows, to ensure that too many connections will not be created, thereby avoiding waste of database and server resources.

  3. Simplify connection management: Connection pooling allows us to manage connections more easily, just as a bank's queuing system allows bank staff to focus on customer business. Through the connection pool, we don't need to manually create and release the connection, just get the connection from the connection pool and use it, and return it to the connection pool after completion. This simplifies the work of connection management and improves development efficiency. To sum up, the MySQL connection pool is like a bank queuing system, which can improve connection efficiency, save system resources, manage connection reliability, and simplify connection management. The connection pool plays an important role in high-concurrency database operations, helping us connect and interact with the MySQL database more efficiently and conveniently.

4. How does Mysql handle connection requests?

When Mysql receives a network connection request, how does it process the request, and how to execute the SQL finally, let's take a look at the steps in the whole process link.
first:

  1. The network connection must be assigned to a thread for processing, and a thread monitors the request and reads the request data, such as reading and parsing a SQL statement sent by the Java system from the network connection
    .
  2. A component is provided inside Mysql: SQL Interface (SQL Interface), which is used to specifically execute SQL statements
  3. Then use the query optimizer: select the optimal query path to execute, function: generate a query path tree for complex SQL statements written by you with tens of lines, hundreds of lines or even thousands of lines, and then select an optimal query from it path out.
  4. Call the executor: call the interface of the storage engine according to the execution plan
  5. Call the storage engine interface to actually execute the SQL statement. Function: The executor will call the storage engine interface according to a certain order and steps according to the execution plan selected by the optimizer, and execute the logic of the SQL statement
  6. Storage engine: manage and store data, support a variety of storage engines such as: InnoDB, MyISAM, Memory, we can choose which storage engine to use to be responsible for specific SQL statement execution Now MySQL generally uses the InnoDB storage engine by default

insert image description here
If you are interested in the entire execution process above, you can study it in depth, and this article will not introduce the details. Let's analyze how the InnoDB storage engine manages and stores our data.

5. Important memory structure of InnoDB: buffer pool

In the InnoDB storage engine, there is a very important component in the memory, which is the buffer pool (BufferPool), which will cache a lot of data, so that when you query later, if you have data in the memory buffer pool, just You don’t need to check the disk, let’s look at the picture below.
insert image description here
For example, the SQL statement: update users set name='xxx' where id=1, for example, for the row of data "id=1", he will first check whether the row of data "id=1" is in the buffer pool, if If it is not there, it will be loaded directly from the disk into the buffer pool, and then an exclusive lock will be added to this row of records.

The buffer pool uses the LRU (Least Recently Used) algorithm to manage data pages in memory. When a query needs to access data, InnoDB first checks whether the corresponding data page exists in the buffer pool. If present, it fetches the data directly from memory instead of reading from disk, which greatly improves query performance. If the data page is not in the buffer pool, InnoDB will read it into the buffer pool and keep it in memory for subsequent queries.

By properly configuring the size of the buffer pool, frequently used data pages can always be kept in memory, improving query efficiency. Larger buffer pools are generally suitable for servers with large amounts of memory

6.undo log file: so that updated data can be rolled back

Undo log files are used to record the operations of ongoing transactions in the database to provide rollback data when a transaction needs to be rolled back. When an update, delete, or insert operation occurs, the InnoDB engine will record relevant information to the Undo log file.

When a transaction needs to be undone, the InnoDB engine uses the Undo log to restore the data to the state before the transaction started. It undoes modifications to data by reversing the operation and restores the data to its previous state.
insert image description here
When we load the record to be updated from the disk file to the buffer pool, lock it at the same time, and write the old value before the update to the undo log file, we can officially start updating the record. When updating, the records in the buffer pool will be updated first, and the data at this time is dirty data.

The so-called updating the data in the memory buffer pool here means to change the name field of the "id=1" line of data in the memory
to "xxx":
insert image description here

7. Redo log files: ensure data consistency and persistence

Now let's imagine that if the modification operation in the above figure has been written into the cache, but it has not been synchronized to the disk for persistence in the future; at this time, the msyql machine is down and hangs up, then the data in the cache will inevitably If it is lost, the updated data will also be lost. Therefore, in order to ensure the consistency and durability of Mysql data, the innodb engine introduces redo log files.

The Redo Log is a physical log that is mainly used to record the modification operations performed on the database before the transaction is committed. When the database crashes or fails, the Redo Log can be used to restore to the last submitted state to ensure data persistence.

The role of Redo Log is mainly reflected in the following two aspects:

  1. Data recovery: When the database fails, uncommitted modification operations can be reapplied to the database through Redo Log, thereby restoring to the last submitted state.
  2. Improve performance: By recording modification operations in the Redo Log, disk IO operations can be converted into sequential write operations, greatly improving the write performance of the database.

Therefore, when the update operation is executed, Mysql will write the modification to the memory into a Redo Log Buffer, which is also a buffer in the memory and is used to store the redo log. The so-called redo log is to record what modifications you have made to the data, such as changing the value of the name field to xxx for the record "id=10", this is a log. As shown in the figure below:
insert image description here
Remarks: innodb_log_buffer_size: Specifies the buffer size of Redo Log, the default is 8MB. A larger value
can reduce frequent refresh operations and improve performance, but it will also take up more memory.

8. Submit the transaction: redo log flushing

When the transaction is committed, the data in the cache area in the redolog will be flushed to the disk. So does data loss matter at this point?

In fact, it doesn’t matter, because if you did not submit a transaction for an update statement, it means that it failed to execute successfully. At this time, although MySQL’s downtime caused all the data in the memory to be lost, you will find that the data on the disk is still in the original state. look.

Three strategies for writing redo logs to disk

The flushing strategy is configured through innodb_flush_log_at_trx_commit, which has several options:

  1. If the parameter value is 0, the redo log does not enter the disk, which means that the redo log is not flushed to the disk, that is, the asynchronous writing strategy. When a transaction is committed, the modification operation of the Redo Log will only be written to the page cache of the operating system, and will not be flushed to the disk immediately. This provides the best write performance, but may result in some degree of data loss in the event of a database crash or failure.
  2. The parameter value is 1, and the redo log is sent to the disk [default value] means that the Redo Log is flushed to the disk synchronously. When the transaction is committed, the modification operation of the Redo Log will be written to the disk immediately and wait for the completion of the IO operation. While ensuring data persistence, it will also have a certain impact on performance. This is the most commonly used setting and is suitable for most application scenarios.

insert image description here

  1. The parameter value is 2, and the redo log is entered into the os cache.

Indicates that the modification operation of the Redo Log is written to disk every time a transaction is committed, but it does not wait for the completion of the IO operation. When a transaction is committed, the Redo Log is first written to the page cache of the operating system, and then the background thread asynchronously flushes the data to the disk. This setup can provide better performance and some degree of data protection, but there are still some risks.
insert image description here
Flush strategy selection
Selecting the appropriate innodb_flush_log_at_trx_commit value depends on the requirements for data persistence and performance. It can be set to 1 if the data persistence requirements are very high. If the performance requirement is high and a certain degree of data loss is acceptable, it can be set to 0. If you pursue better performance while ensuring a certain degree of data protection, you can choose to set it to 2.

You can adjust the innodb_flush_log_at_trx_commit value by modifying the parameter settings in the MySQL configuration file, and restart the MySQL service to make it take effect.

We usually recommend setting it to 1. That is to say, when committing a transaction, the redo log must be flushed into the disk file. This can strictly guarantee that after the transaction is committed, the data will never be lost, because there are redo logs in the disk file to restore all the modifications you made.

9. What exactly is binlog

In fact, the redo log we mentioned before is a kind of redo log that is biased towards physical nature, because it records something like this, "what modification was made to what record in which data page".

And the redo log itself is something unique to the InnoDB storage engine. The binlog is called an archive log, which records a log that is biased towards logic, similar to "update a row of data with id=1 in the users table, what is the value after the update", binlog is not an InnoDB storage engine The unique log file is a log file belonging to the mysql server itself. Therefore, when a transaction is submitted, binlog will be written at the same time: insert image description here
Analysis of binlog log flushing strategy
For binlog logs, there are actually different flushing strategies. There is a sync_binlog parameter that can control the binlog flushing strategy, and its default value is 0 , when you write the binlog to the disk, it does not directly enter the disk file, but enters the os cache memory cache. So the same as the previous analysis, if the machine is down at this time, then your binlog log in the os cache will be lost:
insert image description here
if you set the sync_binlog parameter to 1, then at this time it will be forced to submit the transaction. The binlog is written directly to the disk file, so after the transaction is committed in this way, even if the machine goes down, the binlog on the disk will not be lost.

Complete transaction submission based on binlog and redo log

When we write the binlog to the disk file, the final transaction submission will be completed. At this time, the name of the binlog file corresponding to this update and the location of the updated binlog log in the file will be written to the redo log. Go to the log file, and write a commit mark in the redo log log file at the same time. After completing this matter, the transaction submission is finally completed. Let’s look at the diagram below:
insert image description here
What is the significance of writing the commit mark in the redo log in the last step?

To keep the redo log consistent with the binlog log, the final transaction commit mark must be written in the redo log, and then the transaction commits successfully at this time, and there is a log corresponding to this update in the redo log, and there is also a log in the binlog The log corresponding to the second update, redo log and binlog are completely consistent

The background IO thread randomly flushes the dirty data after the memory update to the disk

MySQL has a background IO thread, which will randomly flush the modified dirty data in the memory buffer pool back to the data file on the disk at a certain time in the future. Let's see the following figure: in your IO
insert image description here
thread Before flushing the dirty data back to the disk, it doesn’t matter even if mysql crashes, because after restarting, it will restore the modification made by the transaction submitted before according to the redo log to the memory, and then wait for the right time, the IO thread will naturally make this modification The final data is flushed to the data file on the disk.

10. Summary

The InnoDB storage engine mainly contains some cached data in memory such as buffer pool and redo log buffer, and also contains some undo log files, redo log files, etc., and the mysql server itself also has binlog log files.

When you perform an update, each SQL statement will correspond to modifying the cached data in the buffer pool, writing the undo log, and writing the redo log buffer; but when you submit the transaction, the redo log will definitely be flushed to the disk , the binlog is flushed to the disk, and the transaction commit mark in the redo log is completed; finally, the background IO thread will randomly flush the dirty data in the buffer pool to the disk.

At the end of the article, likes are still required, in case there is a handsome guy in front of the screen, just like it! ! ! !
insert image description here

Guess you like

Origin blog.csdn.net/u014494148/article/details/131909510