MySQL interview frequently asked questions and answers summary

1. Transaction characteristics ACID

Atomicity

Transactions are regarded as the smallest indivisible unit. All operations of a transaction are either submitted successfully or rolled back after failure.

Rollback可以 is implemented using 回滚日志(Undo Log). The rollback log records the modification operations performed by the transaction, and these modifications are executed in reverse during rollback. Just do it.

Consistency

The database maintains a consistent state before and after a transaction is executed. In the consistency state, the results of all transactions reading the same data are the same.

Isolation

Modifications made by one transaction are not visible to other transactions until they are finally committed.

Durability

Once a transaction is committed, the changes made will be permanently saved in the database. Even if the system crashes, the results of transaction execution cannot be lost.


2. Issues related to the four isolation levels of transactions

Four types of data concurrency problems

脏写( Dirty Write )

Lost modification means that the update operation of one transaction is replaced by the update operation of another transaction. Generally encountered in real life, for example: two transactions T1 and T2 both modify the same data. T1 modifies it first and submits it to take effect, T2 modifies it subsequently, and the modification of T2 covers T1. Modifications.
Insert image description here

Dirty Read

Reading dirty data means that under different transactions, the current transaction can read uncommitted data of other transactions. For example: T1 modifies a piece of data but does not commit it, and T2 subsequently reads the data. If T1 undoes this modification, then the data read by T2 is dirty data.
Insert image description here

Non-Repeatable Read

Non-repeatable reading refers to reading the same data set multiple times within a transaction. Before this transaction ended, another transaction also accessed the same data set and made modifications. Due to the modification of the second transaction, the first transaction read twice The data may be inconsistent. For example: T2 reads a piece of data, and T1 modifies the data. If T2 reads this data again, the result read at this time is different from the result read the first time.
The difference between non-repeatable read and dirty read: Yes几次读, dirty read is one read, non-repeatable read is two reads

Insert image description here

Phantom

Phantom reading is essentially a case of non-repeatable reading. T1 reads a certain range of data, T2 inserts new data in this range, and T1 reads this again. range of data, the result read at this time is different from the result read for the first time.
The difference between phantom read and non-repeatable read: 修改 or 插入.
Insert image description here

Four isolation levels

The above introduces some problems that may be encountered during the execution of several concurrent transactions. These problems are divided into priorities. Let us rank these problems according to their severity:

Dirty write > Dirty read > Non-repeatable read > Phantom read

Read uncommitted

READ UNCOMMITTED: Read uncommitted. At this isolation level, all transactions can see the execution results of other uncommitted transactions.
Dirty reads, non-repeatable reads, and phantom reads cannot be avoided.

Read submitted

READ COMMITTED: Read committed, which satisfies the simple definition of isolation:A transaction can only see已经提交Changes made by a transaction, in other words, modifications made by one transaction are not visible to other transactions before they are committed. This is the default isolation level for most database systems (but not the MySQL default).
Dirty reads can be avoided, but the problems of non-repeatable reads and phantom reads still exist.

repeatable read

REPEATABLE READ: Repeatable read, after transaction A reads a piece of data, transaction B modifies and commits the data, then transaction A reads the data again, What you read is still the original content. That is to say, the result of reading the same data multiple times in the same transaction is guaranteed to be the same.
Dirty reads and non-repeatable reads can be avoided, but the problem of phantom reads still exists. This is the default isolation level of MySQL.

Serializable

SERIALIZABLE: Serializable, ensuring that transactions can read the same rows from a table. During the duration of this transaction, other transactions are prohibited from performing insert, update, and delete operations on the table. Force transactions to be executed serially, so that multiple transactions do not interfere with each other and there will be no concurrent consistency issues.
All concurrency problems can be avoided, but the performance is very low. Can avoid dirty reads, non-repeatable reads and phantom reads.

Solutions to concurrency problems

How to solve dirty reads, non-repeatable reads, and phantom reads? There are actually two alternative solutions:

Option 1: The read operation uses multi-version concurrency control (MVCC, explained in the next chapter), and the write operation is performed 加锁.
Insert image description here
Option 2: Both reading and writing operations adopt the 加锁 method.

Comparison:
Using the MVCC method, 读-写 operations do not conflict with each other, 性能更高.
If the 加锁 method is used, 读-写 operations require each other 排队执行 , affecting performance.

Under normal circumstances, of course we are willing to use MVCC to solve the problem of 读-写 concurrent execution of operations, but in some special circumstances, the business requires that it must be used 加锁. The following explains the different types of locks in MySQL.

Lock

Blocking granularity

MySQL provides two blocking granularities: 行级锁 and 表级锁.

You shouldtry to lock only the part of the data that needs to be modified, rather than all resources. The smaller the amount of data locked, the smaller the possibility of lock contention and the higher the degree of concurrency of the system.

However, locking requires resources, and various lock operations (including acquiring locks, releasing locks, and checking lock status) will increase system overhead. Therefore, the smaller the block granularity, the greater the system overhead.

When choosing a locking granularity, a trade-off needs to be made between lock overhead and concurrency.

Blocking type

read-write lock

Mutual exclusion lock (Exclusive), abbreviated as X lock, also known as 写锁.
Shared lock (Shared), abbreviated as S lock, also known as 读锁.

There are two provisions:

(1) If a transaction adds X lock to data object A, it can read and update A. During the locking period, other transactions cannot add any locks to A.

(2) A transaction adds a S lock to data object A. Read operations can be performed on A, but update operations cannot be performed. During the locking period, other transactions can add S lock to A, but cannot add X lock.

The compatibility relationship of locks is as follows:
Insert image description here

intention lock

Multi-granularity locking can be more easily supported using 意向锁 (Intention Locks).

When 行级锁 and 表级锁 exist, transaction T wants to add X lock to table A, then You need to first detect whether other transactions have locked table A or any row in table A. Then you need to detect every row in table A, which is very time-consuming.

Intention lock introduced on top of the original X/S lock, are both table locks, used Indicates that a transaction adds lock or lock on a certain data row in the table. There are two provisions:IX/ISIX/IS想要XS

(1) Before a transaction obtains the S lock of a data row object, it must first obtain the IS lock of the table or a stronger lock ;
(2) Before a transaction obtains the X lock of a data row object, it must first obtain the IX lock of the table .

By introducing意向锁, transaction T wants to add X lock to table A. It only needs to first check whether other transactions have added X/IX/S/IS lock, if added, it means that other transactions are using the lock of this table or a row in the table, so transaction T fails to add X lock.

The compatibility relationships of various locks are as follows:
Insert image description here

The explanation is as follows:

(1) AnyIS/IX locks are compatible with each other, because they only indicate that they want to lock the table, rather than actually locking it;
(2) The compatibility relationship here is for table-level locks, and table-level IX locks are compatible with row-level X locks. The two transactions can Two data rows are locked withX. (Transaction T1 wants to add X lock to data row R1, transaction T2 wants to add X lock to data row R2 of the same table, both transactions need Add IX locks to the table, but IX locks are compatible, and IX locks are compatible with row-level < a i=11> The lock is also compatible, so both transactions can successfully lock and modify two data rows in the same table.)X

Level 3 lockdown protocol

First-level blocking protocol (solve the dirty write problem)

When transaction T wants to modify data A, it must add X lock, and the lock will not be released until T ends.

can solve the脏写 problem, because two transactions cannot modify the same data at the same time, so the transaction modifications will not be overwritten.
Insert image description here

Secondary blocking protocol (solve the dirty read problem)

On the basis of one level, it is required that the S lock must be added when reading data A, and the S lock must be released immediately after reading.

can solve读脏 the data problem, because if a transaction modifies data A, according to the level 1 blocking protocol, a X lock will be added, then No more S locks can be added, that is, no data will be read.
Insert image description here

Three-level blocking protocol (solve the non-repeatable read problem)

On the basis of the second level, it is required that when reading data A, the S lock must be added, and the S lock cannot be released until the transaction ends.

can solve the problem of不可重复读, because when reading A, other transactions cannot lock AX, thus avoiding data changes during reading. .
Insert image description here

Two-stage locking protocol (serializable)

加锁 and 解锁 are divided into two stages.

可串行化调度 means that uses concurrency control to make the result of a concurrently executed transaction the same as the result of a serially executed transaction. Transactions executed serially do not interfere with each other, and there will be no concurrency consistency issues.

Transaction compliance两段锁协议 is guaranteed to be serializable scheduling充分条件. For example, the following operation satisfies the two-stage lock protocol, which is serializable scheduling.

lock-x(A)...lock-s(B)...lock-s(C)...unlock(A)...unlock(C)...unlock(B)

But it is not a necessary condition. For example, the following operation does not satisfy the two-stage lock protocol, but it can still be scheduled serially.

lock-x(A)...unlock(A)...lock-s(B)...unlock(B)...lock-s(C)...unlock(C)

MySQL implicit and explicit locking

MySQL's InnoDB storage engine uses两段锁协议, which will automatically lock when needed according to the isolation level, and all locks will be released at the same time, which is called < /span>隐式锁定.

InnoDB can also use specific statements for explicit locking:

SELECT ... LOCK In SHARE MODE;
SELECT ... FOR UPDATE;

Multi-version concurrency control

Multi-Version Concurrency Control (MVCC) is a specific way for MySQL's InnoDB storage engine to implement isolation levels. It is used to implement 读已提交 and 可重复读These two isolation levels. The 读未提交 isolation level always reads the latest data row, which has very low requirements and does not require the use of MVCC. 可串行化The isolation level requires locking all read rows, which cannot be achieved simply using MVCC.

Basic idea

As mentioned in the section on locking, locking can solve the concurrency consistency problem that occurs when multiple transactions are executed at the same time. In actual scenarios, read-write locks are introduced to avoid unnecessary locking operations. For example, there is no mutual exclusion relationship between reading and reading. Read and write operations in read-write locks are still mutually exclusive, and MVCC utilizes the idea of ​​, . 读操作往往多于写操作多版本写操作更新最新的版本快照,而读操作去读旧版本快照,没有互斥关系,这一点和 CopyOnWrite 类似

The 修改操作(DELETE、INSERT、UPDATE) of the transaction in MVCC will add a to 数据行. 版本快照

脏读The most fundamental reason for and 不可重复读 is transaction读取到其它事务未提交的修改. When a transaction performs a read operation, in order to solve the problems of dirty reads and non-repeatable reads, MVCC stipulates that only committed snapshots can be read. Of course, a transaction can read its own uncommitted snapshot, which is not considered a dirty read.

version number

系统版本号 SYS_ID: It is an increasing number. Every time a new transaction is started, the system version number will automatically increase.
事务版本号 TRX_ID : The system version number when the transaction started.

Undo log

MVCC's multi-version refers to multiple versions of snapshots. The snapshots are stored in the Undo log. The log passes a rollback pointerROLL_PTR All snapshots of data rows are concatenated.

For example, create a table t in MySQL, containing the primary key id and a field x. We first insert a data row and then perform two update operations on the data row.

INSERT INTO t(id, x) VALUES(1, "a");
UPDATE t SET x="b" WHERE id=1;
UPDATE t SET x="c" WHERE id=1;

Because START TRANSACTION is not used, the above operation is performed as a transaction. According to MySQL's AUTOCOMMIT mechanism, each operation will be treated as a transaction. Execution, so the above operations involve a total of 三个事务. In addition to recording the transaction version number TRX_ID and operation, the snapshot also records a bit DEL field to mark whether been deleted. The
Insert image description here
INSERT、UPDATE、DELETE operation creates a log and writes the transaction version number TRX_ID. DELETE can be regarded as a special UPDATE, and the DEL field will be additionally set to 1.

ReadView

MVCC maintains a ReadView structure, which mainly contains the uncommitted transaction list of the current systemTRX_IDs {TRX_ID_1, TRX_ID_2, ...}, as well as the minimum value of the list TRX_ID_MIN and TRX_ID_MAX.
Insert image description here
When performing the SELECT operation, according to the TRX_ID and TRX_ID_MIN and of the data row snapshot The relationship between a>TRX_ID_MAX to determine whether the data row snapshot can be used:

(1)TRX_ID < TRX_ID_MIN, indicating that the data row snapshot was changed before all current uncommitted transactions, so it can be used.

(2)TRX_ID > TRX_ID_MAX, indicating that the data row snapshot was changed after the transaction was started and therefore cannot be used.

(3)TRX_ID_MIN <= TRX_ID <= TRX_ID_MAX, you need to make a judgment based on the isolation level:

  • Read committed: If TRX_ID is in the TRX_IDs list, it means that the transaction corresponding to the data row snapshot has not yet been committed, and the snapshot cannot be used. Otherwise, it means it has been submitted and can be used.

  • Repeatable read: Neither can be used. Because if it can be used, other transactions can also read this data row snapshot and modify it. Then the value obtained by the current transaction when reading this data row will change, which means that a non-repeatable read problem occurs.

When the data row snapshot is unavailable, you need to find the next snapshot along the rollback pointer of and then make the above judgment. . Undo LogROLL_PTR

Snapshot read and current read

snapshot read

MVCC's SELECT operation is the data in the snapshot and does not require locking operations.

SELECT * FROM table ...;
currently reading

MVCC Other operations that will modify the database(INSERT、UPDATE、DELETE) require locking operations to read the latest data. It can be seen that MVCC does not eliminate the need for locking at all, but only avoids the locking operation of SELECT.

INSERT;
UPDATE;
DELETE;

When performing a SELECT operation, you can forcefully specify a locking operation. The first statement below needs to be locked with S, and the second statement needs to be locked with X.

SELECT * FROM table WHERE ? lock in share mode;
SELECT * FROM table WHERE ? for update;

Next-Key Locks

Next-Key LocksIt is a lock implementation of MySQL's InnoDB storage engine.

MVCC cannot solve the phantom read problem. Next-Key Locks exists to solve this problem. Under the repeatable read (REPEATABLE READ) isolation level, using MVCC + Next-Key Locks can solve the 幻读 problem.

Record Locks

Locks索引 on a record, not the record itself.

If the table is not indexed, InnoDB will automatically create a hidden clustered index on the primary key, so Record Locks can still be used.

Gap Locks

Locks the gaps between indexes, but not the indexes themselves. For example, when a transaction executes the following statement, other transactions cannot insert 15 in t.c.

SELECT c FROM t WHERE c BETWEEN 10 and 20 FOR UPDATE;

Next-Key Locks

It is a combination of Record Locks and Gap Locks, which not only locks the index on a record, but also locks the gap between indexes. It locks an open and closed range. For example, if an index contains the following values: 10, 11, 13, and 20, then the following range needs to be locked:

(-, 10]
(10, 11]
(11, 13]
(13, 20]
(20, +)

3. Three paradigm distinctions

Insert image description here
Insert image description here
Insert image description here


4. The difference between MyISAM and InnoDB

Transaction: InnoDB is transactional and can use Commit and Rollback statements.

Concurrency: MyISAM only supports table-level locks, while InnoDB also supports row-level locks.

Foreign keys: InnoDB supports foreign keys.

Backup: InnoDB supports online hot backup.

Crash recovery: MyISAM has a much higher probability of corruption after a crash than InnoDB, and the recovery speed is also slower.

Other features: MyISAM supports compressed tables and spatial data indexes.


5. B+Tree principle and MySQL index

View:MySQL index


6. Window function

Insert image description here

Insert image description here
Suppose I now have a data table like this, which shows the sales of a shopping website in each district of each city:

CREATE TABLE sales(
id INT PRIMARY KEY AUTO_INCREMENT,
city VARCHAR(15),
county VARCHAR(15),
sales_value DECIMAL
);

INSERT INTO sales(city,county,sales_value)
VALUES
('北京','海淀',10.00),
('北京','朝阳',20.00),
('上海','黄埔',30.00),
('上海','长宁',10.00);

Inquire:

mysql> SELECT * FROM sales;
+----+------+--------+-------------+
| id | city | county | sales_value |
+----+------+--------+-------------+
| 1 | 北京 | 海淀 | 10 |
| 2 | 北京 | 朝阳 | 20 |
| 3 | 上海 | 黄埔 | 30 |
| 4 | 上海 | 长宁 | 10 |
+----+------+--------+-------------+
4 rows in set (0.00 sec)

Demand:
Now calculate the total sales of this website in each city, Total sales in the country, The ratio of sales in each district to the sales in the city where it is located, andratio of total sales.
If you use grouping and aggregation function, you need to calculate in several steps.

The first step is to calculate the total sales amount and store it in temporary table a:

CREATE TEMPORARY TABLE a -- 创建临时表
SELECT SUM(sales_value) AS sales_value -- 计算总计金额
FROM sales;

Take a look at temporary table a:

mysql> SELECT * FROM a;
+-------------+
| sales_value |
+-------------+
| 70 |
+-------------+
1 row in set (0.00 sec)

In the second step, calculate the total sales of each city and store it in temporary table b:

CREATE TEMPORARY TABLE b -- 创建临时表
SELECT city, SUM(sales_value) AS sales_value -- 计算城市销售合计
FROM sales
mysql> SELECT * FROM b;
+------+-------------+
| city | sales_value |
+------+-------------+
| 北京 | 30 |
| 上海 | 40 |
+------+-------------+
2 rows in set (0.00 sec)

The third step is to calculate the proportion of sales in each district to the total amount of the city where it is located, and the proportion to the total amount of all sales. We can obtain the required results through the following connection query:

mysql> SELECT s.city AS 城市,s.county AS,s.sales_value AS 区销售额,
-> b.sales_value AS 市销售额,s.sales_value/b.sales_value AS 市比率,
-> a.sales_value AS 总销售额,s.sales_value/a.sales_value AS 总比率
-> FROM sales s
-> JOIN b ON (s.city=b.city) -- 连接市统计结果临时表
-> JOIN a -- 连接总计金额临时表
-> ORDER BY s.city,s.county;
+------+------+----------+----------+--------+----------+--------+
| 城市 || 区销售额 | 市销售额 | 市比率 | 总销售额 | 总比率 |
+------+------+----------+----------+--------+----------+--------+
| 上海 | 长宁 | 10 | 40 | 0.2500 | 70 | 0.1429 |
| 上海 | 黄埔 | 30 | 40 | 0.7500 | 70 | 0.4286 |
| 北京 | 朝阳 | 20 | 30 | 0.6667 | 70 | 0.2857 |
| 北京 | 海淀 | 10 | 30 | 0.3333 | 70 | 0.1429 |
+------+------+----------+----------+--------+----------+--------+
4 rows in set (0.00 sec)

The results show: market sales amount, market sales proportion, total sales amount, and total sales proportion are all calculated.

The same query is much simpler if you use window function. We can do this with the following code:

mysql> SELECT city AS 城市,county AS,sales_value AS 区销售额,
-> SUM(sales_value) OVER(PARTITION BY city) AS 市销售额, -- 计算市销售额
-> sales_value/SUM(sales_value) OVER(PARTITION BY city) AS 市比率,
-> SUM(sales_value) OVER() AS 总销售额, -- 计算总销售额
-> sales_value/SUM(sales_value) OVER() AS 总比率
-> FROM sales
-> ORDER BY city,county;
+------+------+----------+----------+--------+----------+--------+
| 城市 || 区销售额 | 市销售额 | 市比率 | 总销售额 | 总比率 |
+------+------+----------+----------+--------+----------+--------+
| 上海 | 长宁 | 10 | 40 | 0.2500 | 70 | 0.1429 |
| 上海 | 黄埔 | 30 | 40 | 0.7500 | 70 | 0.4286 |
| 北京 | 朝阳 | 20 | 30 | 0.6667 | 70 | 0.2857 |
| 北京 | 海淀 | 10 | 30 | 0.3333 | 70 | 0.1429 |
+------+------+----------+-----------+--------+----------+--------+
4 rows in set (0.00 sec)

The results show that we got the same results as the above query.
Using window functions, the query is completed in just one step. Moreover, since no temporary table is used, the execution efficiency is also higher. Obviously, In this scenario where the results of group statistics are needed to calculate each record, it is better to use the window function.

MySQL supports window functions starting from version 8.0. The role of the window function is similar to grouping data in a query. The difference is that the grouping operation aggregates the grouped results into one record, while the window function places the results in each data record.

window functions can be divided into 静态窗口函数 and 动态窗口函数.
(1) The window size of the static window function is fixed and will not vary due to different records;
(2) The window size of the dynamic window function will Varies from record to record.

Window functions can generally be divided into ordinal functions, distribution functions, front and back functions, first and last functions and other functions, as shown in the following table:
Insert image description here

Grammatical structures

The syntax structure of window function is:
函数 OVER([PARTITION BY 字段名 ORDER BY 字段名 ASC|DESC])
or:
函数 OVER 窗口名 … WINDOW 窗口名 AS ([PARTITION BY 字段名 ORDER BY 字段名 ASC|DESC])

window_function ( expr ) OVER ( 
  PARTITION BY ... 
  ORDER BY ... 
  frame_clause 
)

Insert image description here

(1)OVERKeyword specifies the scope of the function window:

  • If the content in the following brackets is omitted, the window will include all records that meet the WHERE condition, and the window function will be calculated based on all records that meet the WHERE condition.

  • If the brackets after the OVER keyword are not empty, you can use the following syntax to set the window.

(2)Window name: Set an alias for the window to identify the window.

(3)PARTITION BY clause: Specifies which fields the window function groups by. After grouping, the window function can be executed separately in each group.

(4)ORDER BY clause: Specifies which fields the window function sorts by. Performing a sort operation causes the window function to number the sorted data records in order.

(5)FRAME clause: Define rules for a subset of the partition, which can be used as a sliding window.

Create table:

CREATE TABLE goods(
	id INT PRIMARY KEY AUTO_INCREMENT,
	category_id INT,
	category VARCHAR(15),
	NAME VARCHAR(30),
	price DECIMAL(10,2),
	stock INT,
	upper_time DATETIME
);

adding data:

INSERT INTO goods(category_id, category, NAME, price, stock, upper_time)
VALUES
(1, '女装/女士精品', 'T恤', 39.90, 1000, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '连衣裙', 79.90, 2500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '卫衣', 89.90, 1500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '牛仔裤', 89.90, 3500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '百褶裙', 29.90, 500, '2020-11-10 00:00:00'),
(1, '女装/女士精品', '呢绒外套', 399.90, 1200, '2020-11-10 00:00:00'),
(2, '户外运动', '自行车', 399.90, 1000, '2020-11-10 00:00:00'),
(2, '户外运动', '山地自行车', 1399.90, 2500, '2020-11-10 00:00:00'),
(2, '户外运动', '登山杖', 59.90, 1500, '2020-11-10 00:00:00'),
(2, '户外运动', '骑行装备', 399.90, 3500, '2020-11-10 00:00:00'),
(2, '户外运动', '运动外套', 799.90, 500, '2020-11-10 00:00:00'),
(2, '户外运动', '滑板', 499.90, 1200, '2020-11-10 00:00:00');

Next, we verify the function of each window function based on the data in the goods table.

Serial number function

ROW_NUMBER() function

The ROW_NUMBER() function can display the serial numbers in the data sequentially.
Example: Query the product information in descending order of price under each product category in the goods data table.

SELECT ROW_NUMBER() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num, 
id, category_id, category, NAME, price, stock
FROM goods;


+---------+----+-------------+---------------+------------+---------+-------+
| row_num | id | category_id | category | NAME | price | stock |
+---------+----+-------------+---------------+------------+---------+-------+
| 1 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 3 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 4 | 2 | 1 | 女装/女士精品 | 连衣裙 | 79.90 | 2500 |
| 5 | 1 | 1 | 女装/女士精品 | T恤 | 39.90 | 1000 |
| 6 | 5 | 1 | 女装/女士精品 | 百褶裙 | 29.90 | 500 |
| 1 | 8 | 2 | 户外运动 | 山地自行车 | 1399.90 | 2500 |
| 2 | 11 | 2 | 户外运动 | 运动外套 | 799.90 | 500 |
| 3 | 12 | 2 | 户外运动 | 滑板 | 499.90 | 1200 |
+---------+----+-------------+---------------+------------+---------+-------+
12 rows in set (0.00 sec)

Example: Query the information of the three highest-priced products in each product category in the goods data table.

SELECT * FROM (
	SELECT ROW_NUMBER() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num, 
	id, category_id, category, NAME, price, stock 
	FROM goods) t -- 子查询派生出的表需要给出别名,否则会报错 		 
WHERE row_num <= 3;

+---------+----+-------------+---------------+------------+---------+-------+
| row_num | id | category_id | category | NAME | price | stock |
+---------+----+-------------+---------------+------------+---------+-------+
| 1 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 3 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 1 | 8 | 2 | 户外运动 | 山地自行车 | 1399.90 | 2500 |
| 2 | 11 | 2 | 户外运动 | 运动外套 | 799.90 | 500 |
| 3 | 12 | 2 | 户外运动 | 滑板 | 499.90 | 1200 |
+---------+----+-------------+---------------+------------+----------+-------+
6 rows in set (0.00 sec)

In the product category named "Women's Clothing/Ladies' Boutique", there are two products priced at 89.90 yuan, namely sweatshirts and jeans. The serial number of both products should be 2, not 2 for one and 3 for the other. At this time, you can use RANK()函数 and DENSE_RANK()函数 to solve the problem.

RANK() function

Using the RANK() function can sort the serial numbers in parallel, and will skip repeated serial numbers, such as serial numbers 1, 1, and 3.
Example: Use the RANK() function to obtain the product information of each category in the goods data table, sorted from high to low by price.

SELECT RANK() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num,
id, category_id, category, NAME, price, stock
FROM goods;

+---------+----+-------------+---------------+------------+---------+-------+
| row_num | id | category_id | category | NAME | price | stock |
+---------+----+-------------+---------------+------------+---------+-------+
| 1 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 2 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 4 | 2 | 1 | 女装/女士精品 | 连衣裙 | 79.90 | 2500 |
| 5 | 1 | 1 | 女装/女士精品 | T恤 | 39.90 | 1000 |
| 6 | 5 | 1 | 女装/女士精品 | 百褶裙 | 29.90 | 500 |
| 1 | 8 | 2 | 户外运动 | 山地自行车 | 1399.90 | 2500 |
| 2 | 11 | 2 | 户外运动 | 运动外套 | 799.90 | 500 |
| 3 | 12 | 2 | 户外运动 | 滑板 | 499.90 | 1200 |
| 4 | 7 | 2 | 户外运动 | 自行车 | 399.90 | 1000 |
| 4 | 10 | 2 | 户外运动 | 骑行装备 | 399.90 | 3500 |
| 6 | 9 | 2 | 户外运动 | 登山杖 | 59.90 | 1500 |
+---------+----+-------------+---------------+------------+---------+-------+
12 rows in set (0.00 sec)

Example: Use the RANK() function to obtain information on the four highest-priced products in the goods data table with the category "Women's Clothing/Ladies' Boutique"

SELECT * FROM (
	SELECT RANK() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num,
	id, category_id, category, NAME, price, stock
	FROM goods) t
WHERE category_id = 1 AND row_num <= 4;

+---------+----+-------------+---------------+----------+--------+-------+
| row_num | id | category_id | category | NAME | price | stock |
+---------+----+-------------+---------------+----------+--------+-------+
| 1 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 2 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 4 | 2 | 1 | 女装/女士精品 | 连衣裙 | 79.90 | 2500 |
+---------+----+-------------+---------------+----------+--------+-------+
4 rows in set (0.00 sec)

It can be seen that the serial numbers obtained by using the RANK() function are 1, 2, 2, and 4. Products with the same price have the same serial numbers. The subsequent product serial numbers are discontinuous and repeated serial numbers are skipped.

DENSE_RANK() function

DENSE_RANK() function sorts the serial numbers in parallel and will not skip repeated serial numbers, such as serial numbers 1, 1, and 2.
Example: Use the DENSE_RANK() function to obtain the product information of each category in the goods data table, sorted from high to low by price.

SELECT DENSE_RANK() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num,
id, category_id, category, NAME, price, stock
FROM goods;

+---------+----+-------------+---------------+------------+---------+-------+
| row_num | id | category_id | category | NAME | price | stock |
+---------+----+-------------+---------------+------------+---------+-------+
| 1 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 2 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 3 | 2 | 1 | 女装/女士精品 | 连衣裙 | 79.90 | 2500 |
| 4 | 1 | 1 | 女装/女士精品 | T恤 | 39.90 | 1000 |
| 5 | 5 | 1 | 女装/女士精品 | 百褶裙 | 29.90 | 500 |
| 1 | 8 | 2 | 户外运动 | 山地自行车 | 1399.90 | 2500 |
| 2 | 11 | 2 | 户外运动 | 运动外套 | 799.90 | 500 |
| 3 | 12 | 2 | 户外运动 | 滑板 | 499.90 | 1200 |
| 4 | 7 | 2 | 户外运动 | 自行车 | 399.90 | 1000 |
| 4 | 10 | 2 | 户外运动 | 骑行装备 | 399.90 | 3500 |
| 5 | 9 | 2 | 户外运动 | 登山杖 | 59.90 | 1500 |
+---------+----+-------------+---------------+------------+---------+-------+
12 rows in set (0.00 sec)

Example: Use the DENSE_RANK() function to obtain information on the four highest-priced products in the goods data table in the category "Women's Clothing/Ladies' Boutique".


SELECT * FROM(
	SELECT DENSE_RANK() OVER(PARTITION BY category_id ORDER BY price DESC) AS row_num,
	id, category_id, category, NAME, price, stock
	FROM goods) t
WHERE category_id = 1 AND row_num <= 3;

Insert image description here
It can be seen that the row numbers obtained by using the DENSE_RANK() function are 1, 2, 2, and 3. The product serial numbers of the same price are the same. The subsequent product serial numbers are consecutive, and repeated serial numbers are not skipped.

Distribution function

PERCENT_RANK() function

The PERCENT_RANK() function is a rank value percentage function. Calculate as follows.

(rank - 1) / (rows - 1)

Among them, the value of rank is the sequence number generated using the RANK() function, and the value of rows is the total number of records in the current window.

Example: Calculate the PERCENT_RANK value of the goods under the category named "Women's Clothing/Ladies' Boutique" in the goods data table.

#写法一:
SELECT RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS r,
PERCENT_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS pr,
id, category_id, category, NAME, price, stock
FROM goods
WHERE category_id = 1;

#写法二:
SELECT RANK() OVER w AS r,
PERCENT_RANK() OVER w AS pr,
id, category_id, category, NAME, price, stock
FROM goods
WHERE category_id = 1 WINDOW w AS (PARTITION BY category_id ORDER BY price
DESC);

+---+-----+----+-------------+---------------+----------+--------+-------+
| r | pr | id | category_id | category | NAME | price | stock |
+---+-----+----+-------------+---------------+----------+--------+-------+
| 1 | 0 | 6 | 1 | 女装/女士精品 | 呢绒外套 | 399.90 | 1200 |
| 2 | 0.2 | 3 | 1 | 女装/女士精品 | 卫衣 | 89.90 | 1500 |
| 2 | 0.2 | 4 | 1 | 女装/女士精品 | 牛仔裤 | 89.90 | 3500 |
| 4 | 0.6 | 2 | 1 | 女装/女士精品 | 连衣裙 | 79.90 | 2500 |
| 5 | 0.8 | 1 | 1 | 女装/女士精品 | T恤 | 39.90 | 1000 |
| 6 | 1 | 5 | 1 | 女装/女士精品 | 百褶裙 | 29.90 | 500 |
+---+-----+----+-------------+---------------+----------+--------+-------+
6 rows in set (0.00 sec)
CUME_DIST() function

The CUME_DIST() function is mainly used to query the proportion that is less than or equal to a certain value.
Example: Query the proportion of goods data table that is less than or equal to the current price.

SELECT CUME_DIST() OVER(PARTITION BY category_id ORDER BY price ASC) AS cd,
id, category, NAME, price
FROM goods;

Insert image description here

Before and after functions

LAG(expr,n) function

LAG(expr,n)函数Returns the expr value of the first n rows of the current row.
Example: Query the difference between the previous product price and the current product price in the goods data table.

SELECT id, category, NAME, price, pre_price, price - pre_price AS diff_price
FROM (
	SELECT id, category, NAME, price,LAG(price,1) OVER w AS pre_price
	FROM goods
	WINDOW w AS (PARTITION BY category_id ORDER BY price)) t;

+----+---------------+------------+---------+-----------+------------+
| id | category | NAME | price | pre_price | diff_price |
+----+---------------+------------+---------+-----------+------------+
| 5 | 女装/女士精品 | 百褶裙 | 29.90 | NULL | NULL |
| 1 | 女装/女士精品 | T恤 | 39.90 | 29.90 | 10.00 |
| 2 | 女装/女士精品 | 连衣裙 | 79.90 | 39.90 | 40.00 |
| 3 | 女装/女士精品 | 卫衣 | 89.90 | 79.90 | 10.00 |
| 4 | 女装/女士精品 | 牛仔裤 | 89.90 | 89.90 | 0.00 |
| 6 | 女装/女士精品 | 呢绒外套 | 399.90 | 89.90 | 310.00 |
| 9 | 户外运动 | 登山杖 | 59.90 | NULL | NULL |
| 7 | 户外运动 | 自行车 | 399.90 | 59.90 | 340.00 |
| 10 | 户外运动 | 骑行装备 | 399.90 | 399.90 | 0.00 |
| 12 | 户外运动 | 滑板 | 499.90 | 399.90 | 100.00 |
| 11 | 户外运动 | 运动外套 | 799.90 | 499.90 | 300.00 |
| 8 | 户外运动 | 山地自行车 | 1399.90 | 799.90 | 600.00 |
+----+---------------+------------+---------+-----------+------------+
12 rows in set (0.00 sec)
LEAD(expr,n) function

LEAD(expr,n) function returns the value of expr n rows after the current row.
Example: Query the difference between the price of the next product and the price of the current product in the goods data table.

SELECT id, category, NAME, behind_price, price, behind_price - price AS diff_price
FROM(
	SELECT id, category, NAME, price, LEAD(price, 1) OVER w AS behind_price
	FROM goods
	WINDOW w AS (PARTITION BY category_id ORDER BY price)) t;

Insert image description here

Head and tail functions

FIRST_VALUE(expr) function

FIRST_VALUE(expr) function returns the value of the first expr.
Example: Sort by price and query the price information of the first product

SELECT id, category, NAME, price, stock, FIRST_VALUE(price) OVER w AS first_price
FROM goods WINDOW w AS (PARTITION BY category_id ORDER BY price);

Insert image description here

LAST_VALUE(expr) function

LAST_VALUE(expr) function returns the value of the last expr.
Example: Sort by price and query the price information of the last product

SELECT id, category, NAME, price, stock, LAST_VALUE(price) OVER w AS last_price
FROM goods WINDOW w AS (PARTITION BY category_id ORDER BY price);

Insert image description here

Other functions

NTH_VALUE(expr,n) function

NTH_VALUE(expr,n) function returns the value of the nth expr.
Example: Query the price information ranked 2nd and 3rd in the goods data table.

SELECT id, category, NAME, price,NTH_VALUE(price, 2) OVER w AS second_price,
NTH_VALUE(price, 3) OVER w AS third_price
FROM goods WINDOW w AS (PARTITION BY category_id ORDER BY price);

Insert image description here

NTILE(n) function

NTILE(n) function divides the ordered data in the partition into n buckets and records the bucket number.
Example: Divide the products in the goods table into 3 groups according to their prices.

SELECT NTILE(3) OVER w AS nt,id, category, NAME, price
FROM goods WINDOW w AS (PARTITION BY category_id ORDER BY price);

Insert image description here

Summarize

The characteristic of the window function is that it can be grouped and sorted within the group. In addition, the window function will not reduce the number of rows in the original table due to grouping, which is very useful for us to perform statistics and sorting based on the original table data.

Exercise link

Reference link:
MySQL window function exercise


7. The difference between data deletion truncate, delete and drop

TRUNCATE TABLE statement:
(1) Delete all data in the table
(2) Release the storage space of the table

Example:

TRUNCATE TABLE detail_dept;

The TRUNCATE statement cannot be rolled back, but the DELETE statement can be used to delete data.

Compared:

DELETE FROM emp2;
#TRUNCATE TABLE emp2;

SELECT * FROM emp2;

ROLLBACK;

SELECT * FROM emp2;

Insert image description here
(1) Both truncate and delete can clear the data used in the data table, but truncate is faster than delete and uses less system and transaction log resources.
(2) truncate does not have the conditional filtering of where and can only be used alone. delete can not only be used alone but can also be combined with where to delete single or multiple pieces of data.
(3) The implementation principles of deletion are different. Truncate deletes data by releasing the data pages used to store table data, and only records the page release in the transaction log. Each time a delete statement deletes a row, it records an entry in the transaction log for each row deleted.

The following is a brief comparison of the similarities and differences between the three:
(1) truncate and drop are DDL statements and cannot be rolled back after execution; delete is a DML statement and can be rolled back.
(2) Truncate can only be used on tables; delete and drop can be used on tables, views, etc.
(3) Truncate will clear all rows in the table, but the table structure, its constraints, indexes, etc. will remain unchanged; drop will delete the table structure and the constraints, indexes, etc. it depends on.
(4) Truncate will reset the auto-increment value of the table; delete will not.
(5) Truncate will not activate the delete trigger related to the table; delete will.
(6) After truncate, the space occupied by the table and index will be restored to the initial size; the delete operation will not reduce the space occupied by the table or index, and the drop statement will reduce the space occupied by the table. Release it all.


8. B-tree and B+ tree, MySQL clustered index and non-clustered index

Reference links:

B-tree and B+ tree

https://blog.csdn.net/qq_33905217/article/details/121827393

Clustered index and non-clustered index

https://blog.csdn.net/u010786653/article/details/123579393
https://blog.csdn.net/qinxiaoqiang2011/article/details/126290979


9. SQL functions: concat function, concat_ws() function, group_concat() function

SQL functions: concat function, concat_ws() function, group_concat() function
SQL concat() function


10. MySQL large table query optimization

MySQL large table query optimization


Guess you like

Origin blog.csdn.net/weixin_44123362/article/details/130234980