SQL Advanced Series 11 allows SQL fly

EDITORIAL

SQL performance optimization is an important issue database user must face, this section focuses on the optimization of SQL writing, SQL performance but also by the impact of specific features of the database, these are not within the scope of the discussion in this section

Use efficient query

Parameter subqueries, instead of using EXISTS IN

-- 使用EXISTS替代IN的建表语句
CREATE TABLE Class_A
(id char(1), 
 name varchar(30), 
 PRIMARY KEY(id));

CREATE TABLE Class_B
(id   char(1), 
 name varchar(30), 
 PRIMARY KEY(id));

INSERT INTO Class_A (id, name) VALUES('1', '田中');
INSERT INTO Class_A (id, name) VALUES('2', '铃木');
INSERT INTO Class_A (id, name) VALUES('3', '伊集院');

INSERT INTO Class_B (id, name) VALUES('1', '田中');
INSERT INTO Class_B (id, name) VALUES('2', '铃木');
INSERT INTO Class_B (id, name) VALUES('4', '西园寺');

-- 性能慢的写法
SELECT * FROM Class_A WHERE id IN (SELECT id FROM Class_B);
-- 性能快的写法
SELECT * FROM Class_A WHERE EXISTS (SELECT * FROM Class_B WHERE Class_A.id = Class_B.id);

Faster reasons for the use of about two EXISTS :

If the connection is established the index column (id), then do not check the actual table query Class_B, only you need to look into the index on it
If you use EXISTS, as long as found in a row of data to meet the conditions will terminate the inquiry, not the same as using IN scan the entire table. At this point, NOT EXISTS, too.
Parameter is a subquery, using alternative connection IN

-- 使用连接替代IN 
SELECT Class_A.id,Class_A.name
FROM Class_A INNER JOIN Class_B
ON  Class_A.id = Class_B.id;

Avoid ordering

And processes for different languages, SQL language explicitly command the user can not sort the database operations. But the actual database undercover with a variety of sorting, sort representative will have the following operations:

GROUP BY
ORDER BY
Aggregate function (SUM COUNT AVG MIN MAX)
DISTINCT
Set operators (UNION INTERSECT EXCEPT)
Window function (RANK ROW_NUMBER)
Flexibility in the use of set operators ALL option

-- 求所有的id和name
SELECT * FROM Class_A
UNION 
SELECT * FROM Class_B;

-- 如果不在话是否有重复值，则可以使用ALL选项
SELECT * FROM Class_A
UNION ALL
SELECT * FROM Class_B;

Support for ALL of each database option in the following table:

	Oracle	DB2	SQL Sever	PostgreSQL	MySQL
UNION	●	●	●	●	●
INTERSECT	×	●	×	●	-
EXCEPT	×	●	×	●	-

Use EXISTS instead of DISTINCT

-- 使用EXISTS代替DISTINCT的建表语句
CREATE TABLE Items
 (item_no INTEGER PRIMARY KEY,
  item    VARCHAR(32) NOT NULL);

INSERT INTO Items VALUES(10, 'FD');
INSERT INTO Items VALUES(20, 'CD-R');
INSERT INTO Items VALUES(30, 'MO');
INSERT INTO Items VALUES(40, 'DVD');

CREATE TABLE SalesHistory
 (sale_date DATE NOT NULL,
  item_no   INTEGER NOT NULL,
  quantity  INTEGER NOT NULL,
  PRIMARY KEY(sale_date, item_no));

INSERT INTO SalesHistory VALUES('2007-10-01',  10,  4);
INSERT INTO SalesHistory VALUES('2007-10-01',  20, 10);
INSERT INTO SalesHistory VALUES('2007-10-01',  30,  3);
INSERT INTO SalesHistory VALUES('2007-10-03',  10, 32);
INSERT INTO SalesHistory VALUES('2007-10-03',  30, 12);
INSERT INTO SalesHistory VALUES('2007-10-04',  20, 22);
INSERT INTO SalesHistory VALUES('2007-10-04',  30,  7);

-- 查找有销售记录的商品
SELECT Items.item_no
FROM Items INNER JOIN SalesHistory
ON Items.item_no = SalesHistory.item_no;

-- 去重(慢)
SELECT DISTINCT Items.item_no
FROM Items INNER JOIN SalesHistory
ON Items.item_no = SalesHistory.item_no;

-- 去重(快)
SELECT item_no FROM Items WHERE EXISTS (SELECT * FROM SalesHistory WHERE Items.item_no = SalesHistory.item_no);

Use extreme value index function

-- 这样写需要扫描全表
SELECT MAX(item) FROM Items;
-- 这样写可以用到索引
SELECT MAX(item_no) FROM items;

-- 这样写并不是渠道了排序过程，而是优化了排序前的查找速度

Can be written in the WHERE clause conditions do not write in the HAVING clause

-- 聚合后使用HAVING子句过滤
SELECT sale_date,SUM(quantity) 
FROM SalesHistory
GROUP BY sale_date
HAVING sale_date = '2007-10-01';
-- 聚合前使用WHERE子句过滤
SELECT sale_date,SUM(quantity)
FROM SalesHistory
WHERE sale_date = '2007-10-01'
GROUP BY sale_date;

-- 写法二效率更高的原因：GROUP BY聚合时会进行排序，如果事先通过WHERE子句筛选一部分，能够减轻排序的负担；WHERE子句的条件里可以使用索引，HAVING子句是针对聚合后生成的视图进行筛选的，但很多时候聚合后的视图并没有继承原表的索引结构

Use the index in the GROUP BY clause and ORDER BY clause

Really used in the index yet

To operate on the index field

-- 没有使用到索引的情况
SELECT * FROM SomeTable
WHERE col_1 * 1.1 > 100; 
-- 使用到索引的情况
SELECT * FROM SomeTable
WHERE col_1  > 100 / 1.1; 
-- 左侧使用函数也用不到索引
SELECT * FROM SomeTable
WHERE SUBSTR(col_1,1,1) = 'a'; -- 使用索引时，条件表达式的左侧应该是原始字段

Use IS NULL predicate

Usually NULL index field does not exist, it specifies the IS NULL and IS NOT NULL, it will make the index can not be used, leading to poor query performance.

-- IS NULL没办法继续优化
SELECT * FROM SomeTable WHERE col_1 IS NULL;

-- IS NOT NULL时，修改成 > 一个比最小值还小的数
SELECT * FROM SomeTable WHERE col_1 > 0; -- 假设col_1最小值是1

Use the negative form

"<>" / "! =" / "NOT IN" index not used

-- 全表扫描
SELECT * FROM SomeTable WEHRE col_1 <> 100;
-- 否定形式
SELECT * FROM SomeTable WHERE NOT (col_1 = 100);

Use OR

-- 用不到索引的情形
SELECT * FROM SomeTable WEHRE col_1 > 100 OR col_2 = 'abc';

When used in conjunction index, the order of columns error

Suppose there is a combined sequence index "col_1, col_2, col_3"

SELECT * FROM SomeTable WHERE col_1 = 10 AND col_2 = 100 AND col_3 = 500; -- '●'
SELECT * FROM SomeTable WHERE col_1 = 10 AND col_2 = 100                  -- '●'
SELECT * FROM SomeTable WHERE col_1 = 10 AND col_3 = 500;                 -- 'x'
SELECT * FROM SomeTable WHERE col_2 = 100 AND col_3 = 500;                -- 'x'
SELECT * FROM SomeTable WHERE col_2 = 100 AND col_1 = 10;                 -- 'x'

Use LIKE predicate behind or intermediate consistency to match the same

Only the front has been matched in order to use the index

SELECT * FROM SomeTable WHERE col_1 LIKE '%a';  -- 'x'
SELECT * FROM SomeTable WHERE col_1 LIKE '%a%'; -- 'x'
SELECT * FROM SomeTable WHERE col_1 LIKE 'a%';  -- '●'

The default type conversions performed

Example of a column of type CHAR 'col_1' specified conditions

SELECT * FROM SomeTable WHERE col_1 = 10;                  -- 'X'
SELECT * FROM SomeTable WHERE col_1 = '10';                -- '●'
SELECT * FROM SomeTable WHERE col_1 = CAST(10,AS CHAR(2)); -- '●'

Reducing intermediate table

In SQL, a subquery can be seen as a new table, without limitation, the extensive use of intermediate package, will result in performance degradation inquiry

Flexible use of the HAVING clause

-- 无意义的中间表
SELECT * FROM 
(SELECT sale_date,MAX(quantity) AS max_qty FROM SalesHistory GROUP BY sale_date) TMP
WHERE max_qty >= 10;
-- HAVING
SELECT * FROM SalesHistory GROUP BY sale_date HAVING MAX(quantity) >= 10;

IN predicate need to use a plurality of fields, they are aggregated into a

-- 多个字段使用IN
SELECT id,state,city FROM Address1 A1 WHERE state IN (SELECT state FROM Addresses2 A2 WHERE A1.id = A2.id) AND city IN (SELECT city FROM Addresses2 A2 WHERE A1.id = A2.id);

-- 通过字段连接(但可能带来类型转换问题，无法使用索引)
SELECT * FROM Addresses1 A1 WHERE id || state || city IN (SELECT id || state || city FROM Addresses2 A2);

-- 优化版本
SELECT * FROM Addresses1 A1 WHERE (id,state,city) IN (SELECT id,state,city FROM Addresses2 A2);

To connect a further polymerization
Rational use view

Section Summary

Parameter subqueries, instead of using EXISTS IN
When using an index, the left side of the conditional expression should be the original field
In SQL sort can not be explicitly specified, but please note that many undercover operations will be sorted
Minimize the use of an intermediate table useless