MySql logical structure and query process

table of Contents

1. MySQL logical architecture diagram

     1.1 Connection layer

     1.2 Service layer

     1.3 Engine layer

     1.4 Storage layer

2. MySQL query process

     2.1 Client/Server communication protocol

     2.2 Query cache

     2.3 Syntax parsing and preprocessing

     2.4 Query optimization

     2.5 Query execution engine

     2.6 Return the result to the client


1. MySQL logical architecture diagram

     1.1 Connection layer

       Connectors: The top layer is some clients and connection services.

     1.2 Service layer

       Management Serveices & Utilities:  System management and control tools

       Connection Pool: When  each client initiates a new request, the server-side thread processing tool is responsible for receiving the client's request and opening up a new memory space. A new thread is generated in the server-side memory. When each user connects to the server At the end, a new thread is generated in the process address space to respond to client requests. The query requests initiated by the user are run in the thread space, and the results are also cached in this thread and returned to the server. The reuse and destruction of threads are implemented by the thread processing manager. 

       SQL Interface:  Accept the user's SQL commands and return the results that the user needs to query.

       Parser: A parser that analyzes the semantics and syntax of SQL statements, categorizes them according to different types of operations, and forwards them to the next steps in a targeted manner.

       Optimizer: Query optimizer. Before the SQL statement is executed, MySql will use the query optimizer to optimize the query statement. According to the query statement requested by the client and some statistical information in the database, it is obtained after analysis on the basis of a series of algorithms An optimal strategy tells the subsequent program how to obtain the result of this query statement.

       Cache and Buffer: Query cache, the main function is to cache the returned result set of the select request submitted by the client to MySQL in memory. After any change in the data or structure in the query statement, MySQL will make the query statement The cache is invalidated. In an application system with a very high ratio of reads and writes, query caching can significantly improve performance, and of course it also increases memory consumption.

     1.3 Engine layer

       Pluggable Storage Engines: The storage engine interface is really responsible for the storage and acquisition of data in MySQL, and the server communicates with the storage engine through the API. Different storage engines have different functions. MySQL supports the pluggability of storage engines, so we can choose according to our actual needs.

     1.4 Storage layer

       Data storage layer: mainly stores data on the file system running on the bare device, and completes the interaction with the storage engine.

2. MySQL query process

       The essence of MySQL query optimization is to follow some principles so that the MySQL optimizer can operate in a reasonable manner as expected, so as to achieve the effects that different business goals need to achieve. The following figure shows the query process of MySQL. 

     2.1 Client/Server communication protocol

       The MySQL client/server communication protocol is " half-duplex ": at any time, either the server sends data to the client or the client sends data to the server. These two actions cannot happen at the same time . Once one end starts to send a message, the other end has to receive a complete message to respond to it, so we can't and don't need to cut a message into small pieces to send independently, and there is no way to control flow.

  The client sends the query request to the server in a single data packet, so when the query statement is very long, the max_allowed_packet parameter needs to be set . But it should be noted that if the query is too large, the server will refuse to receive more data and throw an exception.

  In contrast, the data that the server responds to the user is usually a lot , consisting of multiple data packets. But when the server responds to the client request, the client must receive the entire returned result in its entirety, instead of simply taking the first few results, and then let the server stop sending . Therefore, in actual development, it is a good habit to keep the query as simple as possible and only return the necessary data, and to reduce the size and number of data packets between communications is a very good habit . This is also the reason for avoiding the use of SELECT * and the addition of LIMIT restrictions in the query. one.

     2.2 Query cache

  Before parsing a query statement, if the query cache is turned on, MySQL will check whether the query statement hits the data in the query cache . If the current query happens to hit the query cache, the result in the cache is returned directly after checking the user permissions once. In this case, the query will not be parsed, an execution plan will not be generated, and it will not be executed .

  MySQL stores the cache in a reference table (similar to the data structure of HashMap), indexed by the hash value, the hash value is passed through the query itself, the current database to be queried, the client protocol version number and other information that may affect the result Calculated. Therefore, the two queries are different in any character (for example: spaces, comments), will cause the cache not to hit .

  If the query contains any user-defined functions, stored functions, user variables, temporary tables, and system tables in the mysql library, the query results will not be cached . For example, the function NOW() or CURRENT_DATE() will return different query results due to different query times, and for example, query statements containing CURRENT_USER or CONNECION_ID() will return different results due to different users. Cache such query results It doesn't make any sense.

  MySQL's query cache system will track each table involved in the query. If these tables (data or structure) change, all cached data related to this table will be invalidated . Because of this, in any write operation, MySQL must invalidate all caches of the corresponding table. If the query cache is very large or fragmented, this operation may cause a lot of system consumption, and even cause the system to freeze for a while. Moreover, the additional consumption of the query cache on the system is not only in write operations, and read operations are no exception. If the query result is cached, then after the execution is completed, the result will be stored in the cache, which will also cause additional system consumption. Based on this, we know that query caching will not improve system performance under any circumstances. Caching and invalidation will bring additional consumption. Only when the resource savings brought by the cache are greater than the resources consumed by itself, will the system performance be improved. . If the system does have some performance problems, you can try to open the query cache and make some optimizations in the database design, such as:

  1) Multiple small tables instead of one big table (be careful not to over-design)

  2) Batch insert instead of cyclic single insert, reduce the number of disk IO

  3) Reasonably control the size of the cache space. Generally speaking, it is more appropriate to set the size to tens of megabytes

  4) You can control whether a query statement needs to be cached through SQL_CACHE and SQL_NO_CACHE

     2.3 Syntax parsing and preprocessing

  MySQL parses SQL statements through keywords and generates a corresponding parse tree. This process parser mainly uses grammatical rules to verify and parse, such as whether the wrong keywords are used in SQL or whether the sequence of keywords is correct, and so on. The preprocessing will further check whether the parse tree is legal according to the MySQL rules, such as checking whether the data table and data column to be queried exists and so on.

     2.4  Query optimization

  The syntax tree generated by the previous steps is considered legal, and the optimizer converts it into a query plan. In most cases, a query can be executed in many ways, and the corresponding results are returned in the end. The role of the optimizer is to find the best execution plan among them . MySQL uses a cost-based optimizer, which tries to predict the cost of a query using a certain execution plan, and chooses the least costly one. In MySQL, you can get the cost of the current query by querying the value of last_query_cost of the current session. There are many reasons why MySQL chooses the wrong execution plan, such as inaccurate statistical information, will not consider operating costs (user-defined functions, stored procedures) that are not under its control, and MySQL thinks that the best is not what we think. Similarly, we want the execution time to be as short as possible, but MySQL only chooses that it thinks the cost is small, but sometimes the cost is small and sometimes it is not what we expected.

     2.5 Query execution engine

  After completing the analysis and optimization phase, MySQL will generate a corresponding execution plan, and the query execution engine will gradually execute the instructions given by the execution plan to obtain the result. Most of the operations in the entire execution process are completed by calling the interfaces implemented by the storage engine. These interfaces are called handler APIs. Each table in the query process is represented by a handler instance. In fact, MySQL creates a handler instance for each table in the query optimization stage. The optimizer can obtain table related information based on the interfaces of these instances, including all column names and index statistics of the table . The storage engine interface provides a very rich function, but its bottom layer only has dozens of interfaces, and these interfaces complete most of the operations of a query like building blocks. 

     2.6 Return the result to the client

  The last stage of query execution is to return the results to the client. Even if the data cannot be queried, MySQL will still return information about the query, such as the number of rows affected by the query and the execution time. If the query cache is turned on and the query can be cached, MySQL will also store the results in the cache. Returning the result set to the client is an incremental and gradual return process . It is possible that when MySQL generates the first result, it will gradually return the result set to the client. In this way, the server does not need to store too many results and consume too much memory, and the client can also get the returned results the first time. It should be noted that each row in the result set will be sent in a data packet that satisfies the client/server communication protocol, and then transmitted through the TCP protocol. During the transmission process, MySQL data packets may be buffered and then sent in batches .

 

This article is summarized from: https://www.bilibili.com/video/BV12b411K7Zu?p=189

              Borrowed from: https://www.cnblogs.com/andy6/p/5789254.html

                                        https://zhuanlan.zhihu.com/p/105049719

          https://blog.csdn.net/fuzhongmin05/article/details/70904190

Guess you like

Origin blog.csdn.net/qq_36756682/article/details/114416834