Comprehensive analysis of MySQL file structure, logical architecture and sql execution process, still afraid of not understanding?

1. MySQL file description

1.1 MySQL folder file

After MySQL is installed on the Linux server, there are the following files:

  • auto.cnf: Each MySQL instance has a unique ID

  • Blue folders: represent databases, each database corresponds to a directory
  • hostname.log: machine name.log general log file, closed by default
  • /var/log/mysqld.log: The error log file is configured in the /etc/my.conf file log-error=/var/log/mysqld.log
  • ib_logfile0, ib_logfile1: redolog redo log files, participate in data storage
  • ibdata1: data file, the system tablespace is stored in this file

1.2, the main log file

1) Error log (errorlog)

It is enabled by default, and the error log cannot be turned off since 5.5.7. The error log records all serious error messages encountered during operation, as well as detailed information about each MySQL startup and shutdown.

Default error log name: hostname.err.

The information recorded in the error log can be defined by log-error and log-warnings , where log-err is to define whether to enable the function of the error log and the storage location of the error log, and log-warnings is to define whether the warning information is also defined to the error log.

#可以直接定义为文件路径,也可以为ON|OFF 
log_error=/var/log/mysqld.log 
#只能使用1|0来定义开关启动,默认是启动的 
log_warings=1 

2) Binary log (bin log)

It is disabled by default and needs to be enabled through the following configuration. :

log-bin=mysql-bin 

where mysql-bin is the basename of the binlog log file, and the full name of the binlog log file: mysql-bin-000001.log

Binlog records all ddl statements and dml statements in the database, but does not include the content of select statements. Statements are stored in the form of events, describing the order of data changes. Binlog also includes the execution time information of each update statement. If it is a DDL statement, it is directly recorded to the binlog log, while a DML statement must be submitted through a transaction to be recorded in the binlog log.

Binlog is mainly used to implement MySQL master-slave replication, data backup, and data recovery.

3) General query log (general query log)

General query logging is turned off by default.

Since the general query log records all user operations, including additions, deletions, and changes, a large amount of information will be generated in an environment with large concurrent operations, resulting in unnecessary disk IO, which will affect the performance of mysql. If it is not for the purpose of debugging the database, it is recommended not to open the query log.

mysql> show global variables like 'general_log'; 

Open method:

#启动开关 
general_log={ON|OFF} 
#日志文件变量,而general_log_file如果没有指定,默认名是host_name.log 
general_log_file=/PATH/TO/file 
#记录类型 
log_output={TABLE|FILE|NONE} 

4) Slow query log (slow query log)

Default is off.

It needs to be turned on with the following settings:

#开启慢查询日志 
slow_query_log=ON 
#慢查询的阈值 
long_query_time=10 
#日志记录文件如果没有给出file_name值, 默认为主机名,后缀为-slow.log。如果给出了文件名, 但不是绝对路径名,文件则写入数据目录。 
slow_query_log_file= file_name 

Record all queries whose execution time exceeds long_query_time seconds, which is convenient for collecting SQL statements with long query time

How much SQL query exceeds the threshold of slow query time: SHOW GLOBAL STATUS LIKE '%Slow_queries%';

1.3. Data file

Tables created by MyIsam engine:

tablename.frm table structure definition file

tablename.MYD data file

tablename.MYI index file

Tables created by the InnoDB engine :

tablename.frm table structure definition file

tablename.ibd data file and index file

2. Logical architecture diagram

Connectors

  • Connector, which refers to the interaction with SQL in different languages

Management Serveices & Utilities

  • System management and control tools

Connection Pool: connection pool

  • Manage user connections and wait for connection requests to be processed.
  • Responsible for monitoring various requests to MySQL Server, receiving connection requests, and forwarding all connection requests to the thread management module. Each client request connected to MySQL Server is assigned (or created) a connection thread to serve it individually.
  • The main job of the connection thread is to be responsible for the communication between the MySQL Server and the client, to accept the client's command request, and to transmit the server-side result information. The thread management module is responsible for managing and maintaining these connection threads. Including thread creation, thread cache and so on.

SQL Interface: SQL interface

  • Accept the user's SQL command, and return the result that the user needs to query. For example, select from is to call SQL Interface

Parser: Parser

  • SQL commands are validated and parsed by the parser when passed to the parser .

The main function:

  • Perform lexical analysis and syntax analysis on the SQL statement, parse it into a syntax tree, then classify it according to different operation types, and then forward it to the next steps in a targeted manner. The transmission and processing of the SQL statement in the future is based on this structure.
  • If an error is encountered during the decomposition process, then the sql statement is unreasonable.

Optimizer: Query Optimizer

  • The SQL statement will use the query optimizer to optimize the query before querying . The SQL statement execution plan viewed by the explain statement is generated by the query optimizer.

Cache and Buffer: query cache

  • Its main function is to cache the returned result set of the select request submitted by the client to MySQL into the memory, which corresponds to a hash value of the query. MySQL will automatically invalidate the Cache of the query after any data change occurs in the base table of the data fetched by the query. In an application system with a very high read-write ratio, the performance improvement of Query Cache is very significant. Of course, it also consumes a lot of memory.
  • If the query cache has a hit query result, the query statement can directly go to the query cache to fetch data. This caching mechanism consists of a series of small caches. Such as table cache, record cache, key cache, permission cache, etc.

Pluggable Storage Engines: Storage Engines

  • Unlike other databases such as Oracle and SQL Server, which have only one storage engine, MySQL has a feature called "Pluggable Storage Engine Architecture", which means that the MySQL database Various storage engines are provided.
  • Moreover, the storage engine is for tables. Users can choose different storage engines for data tables according to different needs, and users can also write their own storage engines according to their own needs. That is to say, different tables in the same database can choose different storage engines creat table xxx()engine=InnoDB/Memory/MyISAM

In short, a storage engine is how to store data , how to build indexes for stored data, and how to update and query data .

3. MySqlServer layer object

3.1, Sql statement execution process

3.2. Connector

**The first step, you will connect to this database first, at this time the connector will receive you. Connectors are responsible for establishing connections with clients, obtaining permissions, maintaining and managing connections. **The connection command is generally written like this:

mysql -h$ip -P$port -u$user -p

After entering the command, you will need to enter the password in the interactive dialog. Although the password can also be written on the command line directly after -p, this may lead to your password being leaked. If you are connecting to a production server, it is strongly recommended that you do not do this.

The mysql in the connection command is a client tool used to establish a connection with the server. After completing the classic TCP handshake, the connector will begin to authenticate your identity, this time using the username and password you entered.

  • If the username or password is incorrect, you will receive an "Access denied for user" error, and the client program ends.
  • If the username and password authentication is passed, the connector will check the permissions you have in the permission table. After that, the permission judgment logic in this connection will depend on the permission read at this time.

This means that after a user successfully establishes a connection, even if you use the administrator account to modify the user's permissions, it will not affect the permissions of the existing connection. After the modification is complete, only newly created connections will use the new permission settings.

After the connection is complete, if you have no follow-up action, the connection is in an idle state, which you can see in the show processlist command. The figure in the text is the result of show processlist, where the Command column displays the line "Sleep", which means that there is an idle connection in the system now.

If the client is inactive for too long, the connector will automatically disconnect it. This time is controlled by the parameter wait_timeout, the default value is 8 hours.

If the client sends the request again after the connection is disconnected, it will receive an error message: Lost connection to MySQL server during query. At this time, if you want to continue, you need to reconnect, and then execute the request.

In the database, a long connection means that after the connection is successful, if the client continues to have requests, the same connection will always be used. A short connection means that the connection is disconnected every time a few queries are executed, and a new one is re-established for the next query.

The process of establishing a connection is usually more complicated, so I suggest that you minimize the actions of establishing a connection in use, that is, try to use a long connection.

However, after all long connections are used, you may find that sometimes the memory occupied by MySQL increases very fast, because the memory temporarily used by MySQL during the execution process is managed in the connection object. These resources will only be released when the connection is disconnected. Therefore, if long connections accumulate, it may cause too much memory usage and be forcibly killed by the system (OOM). From the phenomenon, MySQL restarts abnormally.

How to solve this problem? You can consider the following two options.

  • Periodically disconnect long connections. After using it for a period of time, or after it is judged that a large query that occupies memory has been executed in the program, the connection is disconnected, and then the query needs to be reconnected.
  • If you are using MySQL 5.7 or later, you can reinitialize connection resources by executing mysql_reset_connection after each large operation. This process does not require reconnection and re-authentication, but will restore the connection to the state it was just created.

3.3. Query cache (MySql 8.0 has abandoned the cache function)

After the connection is established, you can execute the select statement. The execution logic will come to the second step: query cache.

After MySQL gets a query request, it will first go to the query cache to see if this statement has been executed before. Previously executed statements and their results may be cached directly in memory as key-value pairs. The key is the value after the query statement hash, and the value is the result of the query. If your query can find the key directly in this cache, then the value will be returned directly to the client.

If the statement is not in the query cache, it will continue to the next stage of execution. After the execution is complete, the execution result will be stored in the query cache. You can see that if the query hits the cache, MySQL can directly return the result without performing the complicated operations behind, which is very efficient.

But most of the time I would advise you not to use query cache, why? Because query caching often does more harm than good.

The query cache is very easy to invalidate. If a table is modified, all query caches related to this table will be cleared. For frequently modified tables, the cache hit rate will be very low. Therefore, only tables that are not frequently modified, such as the system configuration table, are suitable for query caching.

Query cache is off by default

show variables like 'query_cache_type'; 

Query cache hits

SHOW STATUS LIKE 'Qcache_hits'

A value of 0 or OFF disables the use of the cache.

A value of 1 or ON enables caching, except for statements beginning with SELECT SQL_NO_CACHE.

When the value is 2 or DEMAND, only statements beginning with SELECT SQL_CACHE are cached.

You need to modify the configuration my.cnf configuration file, and add the following content to the file to enable the cache:

query_cache_type=2 

So how do you clear the query cache?

  • FLUSH QUERY CACHE; // Clean up query cache memory fragmentation.
  • RESET QUERY CACHE; // Remove all queries from the query cache.
  • FLUSH TABLES; //Close all open tables, and this operation will clear the contents of the query cache.

However, the MySQL 8.0 version directly deleted the entire function of the query cache.

3.4. Analyzer

If the cache is not hit, continue to execute the statement. At this time, the statement is first parsed.

This stage is the function of MySQL's Parser parser and Preprocessor preprocessing module.

First, determine whether the grammar of the text is correct, and then extract the tables, columns and various query conditions from the text, which is essentially the process of compiling an SQL statement, involving lexical analysis, syntax analysis, semantic analysis and other stages.

The parser will first do "lexical analysis".

It is to split the complete SQL into strings:

select customer_id,first_name,last_name from customer where customer_id=14; 

Split into 10 strings:

select,customer_id,first_name,last_name,from,customer,where,customer_id, =,14 

MySQL recognizes it from the keyword "select", which is a query statement that recognizes the string "customer" as "table name customer" and the string "customer_id" as "column customer_id".

The parser then does "grammar analysis".

This step is for the result of lexical analysis, the syntax analyzer performs syntax checking to determine whether it conforms to MySQL syntax.

If the grammar is correct, a parse tree will be generated according to the grammar rules defined by MySQL:

such as sql

select customer_id,first_name,last_name from customer where customer_id=14; 

preprocessor

The preprocessor will further check whether the parse tree is legal, such as whether the table name exists, whether the column exists, etc. At the same time, it will check whether the user has the permission to operate the table.

3.5. Optimizer

Before starting to execute SQL, it is also processed by the optimizer.

The function of the query optimizer is to generate different execution plans according to the parse tree, and then select an optimal execution plan. MySQL uses an optimizer based on the cost model, whichever execution plan has the least cost when executing it. And it is the sum of the overhead of io_cost and cpu_cost, which is usually a common indicator for us to evaluate the execution efficiency of a query.

#查看上次查询成本开销 show status like 'Last_query_cost'; 

Optimization processing by the optimizer:

  • When multiple indexes are available, decide which index to use.
  • When a statement is associated with multiple tables (join), determine the join order of each table and use which table as the reference table.

3.6. Actuator

MySQL knows what you want to do through the analyzer, and knows how to do it through the optimizer, and gets a query plan. So it enters the executor phase and starts executing the statement.

(1) When starting the execution, you must first judge whether you have the permission to execute the query on the table customer. If not, an error of no permission will be returned. (In engineering implementation, if the query cache is hit, permission verification will be done when the query cache returns the result.

(2) If you have permission, use the specified storage engine to open the table to start the query. The executor will use the query interface provided by the engine to extract data according to the engine definition of the table.

For example, in the table customer in our example, the customer_id field is the primary key, then the execution flow of the executor is as follows:

  1. Call the InnoDB engine interface to retrieve the record with customer_id=14 from the primary key index.
  2. The primary key index equivalent query will only query one record, and return the record directly to the client.

At this point, the statement has been executed.

Assuming that the customer_id field is not an index, the query can only perform a full table scan. Then the execution flow of the executor is as follows:

  1. Call the InnoDB engine interface to get the first row of this table, and determine whether the customer_id value is 14, if not, skip it, and if so, cache this row in the result set;
  2. Call the engine interface to fetch the "next row", and repeat the same judgment logic until the last row of the table is fetched.
  3. The executor returns to the client a result set consisting of all the rows that satisfy the conditions in the above traversal process.

At this point, the statement is executed

Author: Running Hairball
Link:
https://juejin.cn/post/6914301427355484174
Source: Nuggets

Guess you like

Origin blog.csdn.net/m0_67645544/article/details/124454041