(1) Mysql history and "query sql" execution process

1. The development history of Mysql

time milestone
1996 MySQL 1.0 is released. Its history can be traced back to 1979, when the author Monty used BASIC to design a report tool.
October 1996 3.11.1 released: MySQL does not have a 2.x version
2000 ISAM upgraded to MylSAM engine. MySQL is open source.
Year 2003 MySQL 4.0 is released, integrating InnoDB storage engine.
2005 MySQL version 5 is released, providing functions such as views and stored procedures.
Year 2008 MySQL AB was acquired by Sun and entered the Sun MySQL era.
Year 2009 Oracle acquired Sun and entered the age of Oracle MySQL.
year 2010 MySQL 5.5 was released, and InnoDB became the default storage engine.
2016 In 2016, MySQL released version 8.0.0. Why is there no 6, 7? 5.6 can be regarded as 6.x, 5.7 can be regarded as 7.x

Because MySQL is open source (and there is also a commercial version), many branches have been developed on the basis of the stable version of MySQL, just like Linux—there are Ubuntu, RedHat, CentOSs, Fedora, Debian

The one that everyone is most familiar with MySQL branch should be MariaDB, because CentOS 7 comes with MariaDBo. How did it come from? After Oracle acquired MySQL, Monty, one of the founders of MySQL, was worried about the future of MySQL database development (development is slow, closed, and may be closed source), so he created a branch MariaDB (2009), using the new Maria storage engine by default. It is an upgraded version of the original MylSAM storage engine.

Other popular branches:
Percona Server is one of the important branches of MySQL. Based on the InnoDB storage engine, it has improved performance and ease of management, and finally formed an enhanced version of the XtraDB engine, which can be used to better play on the server hardware Performance.

There are also some MySQL branches or self-developed storage engines in China, such as InnoSQL from NetEase and ArkDBo from Jishu Yunzhou

There are various ways for us to operate the database, such as the command line in the Linux system, such as the database tool Navicat, such as programs, such as the JDBC API of the Java language or the ORM framework.

Have you ever thought about what actually happened when our tool or program was connected to the database? How does it work internally?

Take a query statement as an example, let's take a look at the MySQL workflow.

2. The execution process of a query Sql statement

Insert picture description here

2.1 Connection

Our program or tool wants to operate the database, what is the first step to do? Establish a connection with the database.

The default port monitored by the MySQL service is 3306. There are many ways for the client to connect to the server. It can be synchronous or asynchronous, long connection or short connection, TCP or Unix Socket, MySQL has a special connection module, and authentication permission is required when connecting.

How do we check how many connections MySQL currently has?

You can use the show status command to fuzzy match Thread:

show global status like 'Thread%';
Field meaning
Threads cached Number of thread connections in the cache
Threads connected Number of connections currently open
Threads created Number of threads created to handle connections
Threads running The number of connections in non-sleep state, usually refers to the number of concurrent connections

Question: Why is the number of connections to view threads? What is the relationship between the client's connection and the server's thread?

Each time the client generates a connection or a session, a thread is created on the server to process it. Conversely, if you want to kill the session, it is the Kill thread.
Since it is allocating threads, maintaining connections will definitely consume server resources. MySQL will automatically disconnect those connections that have been inactive for a long time (SLEEP).

There are two parameters:

show global variables like 'wait timeout'; -- 非交互式超时时间,如 JDBC 程序
show global variables like 'interactive timeout'; -- 交互式超时时间,如数据库工具

The default is 28800 seconds, 8 hours.
Since connections consume resources, what is the default maximum number of connections (that is, concurrent number) allowed by the MySQL service?

In version 5.7, the default is 151, the maximum can be set to 100,000

show variables like 'max connections';

Parameter level description:
The parameters (variables) in MySQL are divided into session and global levels, which are effective in the current session and global, but not every parameter has two levels. For example, max_connections has only the global level.

When there is no parameter, the default is session level, including query and modification.
For example, after modifying a parameter, the query in this window is already effective, but other windows are not effective:

show variables like 'autocommit';
set autocommit = on;

Therefore, if it is only a temporary modification, it is recommended to modify the session level. If you need to take effect in other sessions, you must explicitly add the global parameter.

After executing a query statement, what happens after the client establishes a connection with the server? What is the next step?

2.2 Query cache

MySQL comes with a cache module.
Consider a question: There is a table with 5 million rows of data and no index. If I execute the exact same SQL statement twice, will it become faster the second time?

Answer: No, because mysql's cache is also limited in size. It is impossible to cache 5 million data at a time.

Ask again:

select * from user u where u.name = 'xhc';

Will the above SQL statement use cache?
The answer is: the cache does not take effect, why? MySQL's cache is turned off by default.

show variables like 'query_cache%';

The default closure means that it is not recommended. Why does MySQL not recommend using its built-in cache? The main reason is that the application scenarios of MySQL's built-in cache are limited. The first is that it requires that the SQL statements must be exactly the same, with a space in the middle, and different letter cases are considered to be different SQL.

The second is that when any piece of data in the table changes, all caches of this table will become invalid, so it is not suitable for applications that have a large amount of data updates.
Therefore, for the cache, we still give it to the ORM framework (for example, MyBatis has the first level cache enabled by default), or an independent cache service, such as Redis, to handle it.

In MySQL 8.0, the query cache has been removed.

2.3 Syntax parsing and preprocessing (Parser & Preprocessor)

The main thing to do in this step is to perform lexical analysis , syntax analysis and semantic analysis on the statement based on SQL grammar .

2.3.1 Lexical analysis

Lexical analysis is to break a complete SQL statement into individual words. For example, a simple SQL statement:

select name from user where id = 1;

It will be broken into 8 symbols, what type each symbol is, and where it starts and ends.

2.3.2 Syntax analysis

The second step is grammatical analysis, which will do some grammatical checks on SQL, such as whether single quotes are closed, and then generate a data structure based on the SQL statement according to the grammatical rules defined by MySQL. We call this data structure a parse tree (select lex) .

Lexical syntax analysis is a very basic function. If the Java compiler and Baidu search engine want to recognize sentences, they must also have lexical syntax analysis.
For any database middleware, to parse SQL to complete the routing function, it must also have lexical and syntax analysis functions, such as Mycat, Sharding-JDBC

Question: If I write a SQL with correct lexical and grammatical terms, but the table name or field does not exist, where will the error be reported? Is it the executor or the parser in the database? such as:

select * from xhc;

In fact, an error is reported during parsing, and there is a preprocessor in the link of parsing SQL. It checks the generated parse tree and resolves semantics that the parser cannot parse. For example, it will check the existence of table and column names, and check names and aliases to ensure that there is no ambiguity.
After preprocessing, a new parse tree is obtained.

2.4 Query Optimizer and query execution plan

2.4.1 What is an optimizer?

After getting the parse tree, is the SQL statement executed?
Here we have a question, is there only one way to execute a SQL statement? Or is the SQL finally executed by the database the SQL we sent?

The answer is no. A SQL statement can be executed in many ways, and ultimately return the same result, they are equivalent. But if there are so many execution methods, how can these execution methods be obtained? Which one to choose in the end? According to what criteria to choose?

This is the MySQL query optimizer module (Optimizer).

The purpose of the query optimizer is to generate different execution plans (Execution Plan) based on the parse tree, and then select an optimal execution plan. MySQL uses a cost-based optimizer, which has the least cost. , Whichever is used.

You can use this command to view the cost of the query:

show status like 'Last query cost';

2.4.2 What can the optimizer do?

What optimization types can MySQL's optimizer handle?

  1. When we perform related queries on multiple tables, which table data is used as the reference table.
  2. There are multiple cable bows|when available, which index to choose.

In fact, for every database, the optimizer module is indispensable. They use complex algorithms to achieve the goal of optimizing query efficiency as much as possible.

However, the optimizer is not a panacea, and it is not that inefficient SQL statements can be automatically optimized, nor are they able to select the optimal execution plan every time. Everyone should pay attention when writing SQL statements.

After optimization, what do you get? The optimizer will eventually turn the parse tree into a query execution plan, which is a data structure

How to view the execution plan of MySQL? For example, if multiple tables are associated with queries, which table should be queried first? What indexes may be used when executing queries, and what indexes are actually used?

MySQL provides a tool for executing plans. We add EXPLAIN in front of the SQL statement, you can see the execution plan information.

EXPLAIN select name from user where id=1;

If you want to get detailed information, you can also use FORMAT=JSONor enable optimizer traceo

EXPLAIN FORMAT=JSON select name from user where id=1;

2.5 Storage Engine

We know that mysql has many storage engines, such as myisam, memory, innodb, etc. A table whose table type is myisam, and a table whose table type is innodb table type, how do they store data?

show variables like 'datadir';

By default, each database has its own folder, take the test database as an example. Any storage engine has a frm file, this is the table structure definition file.

Insert picture description here
Different storage engines store data in different ways, and generate different files. There is no memory, innodb is one, and myisam is two.

Here we have a few questions:

  1. How is the table type selected? Can it be modified?
  2. Why does MySQL support so many storage engines? Is one not enough?
  3. What is the difference between these different storage engines?

2.5.1 Storage Engine Selection

The storage engine of a table is specified when the table is created, using the ENGINE keyword.

CREATE TABLE user_innodb' (
id int(11) NOT NULL AUTOINCREMENT,
name varchar(255) DEFAULT NULL,
gender tinyint(1) DEFAULT NULL,
phone varchar(11) DEFAULT NULL,
PRIMARY KEY ('id'),
KEY 'comidx_name_phone' ( name ,'phone')
)ENGINE=InnoDB AUTO_INCREMENT= 1 DEFAULT CHARSET=utf8mb4;

Many times we write our own table building statement without specifying a storage engine.

When not specified, the database will use the default storage engine. Before 5.5.5, the default storage engine is MylSAM, and after 5.5.5, the default storage engine is InnoDB.

What is the difference between so many storage engines?

Imagine: If I have a table that needs high access speed without considering the issue of persistence, should I put the data in memory?
If a table is used for historical data archiving, there is no need Modification does not require an index. Does it support data compression?
If a table is used in a business with a lot of concurrent reads and writes, is it necessary to support read and write without interference, and to ensure relatively high data consistency? ?

At this point, everyone should understand that why we need to support so many storage engines is because we have different business requirements, and one storage engine cannot provide all the features.

2.5.2 Introduction to common storage engines

  • MylSAM (3 files)
    has a relatively small application range. Table-level locking limits read/write performance, so in Web and data warehouse configurations, it is usually used for read-only or read-based work.
    Features: Support table-level locking (insert and update will lock the table)
    Advantages: Have a higher insert and query (select) speed. The number of rows in the table is stored (count speed is faster) (How to quickly insert 1 million data into the database? We have an operation that first inserts data with MylSAM and then changes the storage engine to InnoDB.)
    Disadvantages: Does not support transactions.
    Suitable: only Data analysis projects such as reading.
  • InnoDB (2 files)
    The default storage engine in mysql 5.7. InnoDB is a transaction-safe (ACID-compatible) MySQL storage engine with commit, rollback, and crash recovery functions to protect user data. InnoDB row-level locking improves multi-user concurrency and performance. InnoDB stores user data in a clustered index to reduce I/O for common queries based on primary keys. In order to maintain data integrity, InnoDB also supports foreign key referential integrity constraints.
    Features: 1. Support transactions and foreign keys, so data integrity and consistency are higher.
    2. Support row-level locks and table-level locks.
    3. Support read and write concurrency, write non-blocking read (MVCC).
    4. The special index storage method can reduce IO. , Improve query efficiency.
    Suitable: frequently updated tables, business systems with concurrent read and write or transaction processing.

short story:

InnoDB was originally developed by InnobaseOy, and it cooperated with MySQL AB to open source the InnoDB code. But I didn't expect MySQL's rival Oracle to acquire InnobaseOy. Later, in 2008, Sun (the Sun who developed the Java language) acquired MySQL AB, and in 2009, Sun was acquired by Oracle, so MySQL and InnoDB are another family. Some people think that MySQL is more and more like Oracle, which is actually the reason.

  • Memory (1 file)
    stores all data in RAM for quick access in environments where non-critical data needs to be quickly found. This engine was previously called a heap engine. Its use cases are declining; InnoDB and its buffer pool memory area provide a universal and durable method to store most or all of the data in memory, and ndbduster provides fast key-value lookup for large distributed data sets.
    Features:
    Put the data in the memory, the speed of reading and writing is very fast, but the data will all disappear if the database is restarted or crashed. Only suitable for temporary tables. Store the data in the table in memory.
  • CSV (3 files)
    Its table is actually a text file with comma separated values. The csv table allows data to be imported or dumped in CSV format to exchange data with scripts and applications that read and write the same format. Because the CSV table has no index, the data is usually saved in the innodb table during normal operation, and the csv table is only used during the import or export stage.
    Features: Blank lines are not allowed, and indexes are not supported. The format is universal and can be edited directly, suitable for importing and exporting between different databases.
  • Archive (2 files)
    These compact non-indexed tables are used to store and retrieve large amounts of rarely cited historical, archive, or security audit information.
    Features: does not support index, does not support update delete

These are some common storage engines in MySQL. We have seen that different storage engines provide different features. They have different storage mechanisms, indexing methods, locking levels and other functions.

We have different requirements for data operations in different business scenarios, so we can choose different storage engines to meet our needs. This is why MySQL supports so many storage engines.

2.5.3 How to choose a storage engine?

  • If you have high requirements for data consistency and need transaction support, you can choose InnoDB.
  • If there are more data queries and fewer updates, and higher query performance requirements, you can choose MyISAM.
  • If you need a temporary table for query, you can choose Memory.
  • If all storage engines cannot meet your needs and the technical capabilities are sufficient, you can develop a storage engine in C language according to the internal manual on the official website:
    https://dev.mvsql.com/doc/internals/en/custom-engine.
    According to this development specification, html implements the corresponding interface and operates the actuator.

That is to say, why can support so many storage engines and customize the storage engine? The change of the table storage engine has no effect on server access, because everyone has followed certain specifications and provided the same operation interface.
Each storage engine has its own service.

show engine innodb status;

These storage engines manage data files in different ways, provide different features, but provide the same interface for the upper layer.

2.6 Query Execution Engine

After the storage engine is analyzed, it is the way we store data. So, who uses the execution plan to operate the storage engine?

This is our execution engine, which uses the corresponding API provided by the storage engine to complete operations.
Why do we modify the storage engine of the table without any changes to the operation mode? Because the APIs implemented by storage engines with different functions are the same. Finally, the data is returned to the client.

to sum up

In general, we can divide MySQL into three layers,

  • The connection layer that interfaces with the client.
  • The service layer that actually performs the operation.
  • The storage engine layer that deals with hardware.
  1. Connection layer If
    our client wants to connect to port 3306 of the MySQL server, it must establish a connection with the server. Then manage all connections and verify the identity and permissions of the client. These functions are completed in the connection layer.
  2. Service layer The
    connection layer will hand over the SQL statements to the service layer, which also contains a series of processes:
    such as query cache judgment, call the corresponding interface according to SQL, and perform lexical and grammatical analysis of our SQL statements (such as keywords How to identify, how to identify aliases, whether there are any errors in syntax, etc.).
    Then there is the optimizer. The bottom layer of MySQL will optimize our SQL statements according to certain rules (the principle of least cost), and finally give them to the executor to execute.
  3. Storage Engine The
    storage engine is where our data is actually stored. Different storage engines are supported in MySQL.
    Next is the memory or disk.

Guess you like

Origin blog.csdn.net/nonage_bread/article/details/112712668