mysql官方文档之Range Optimization

The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections give descriptions of conditions under which the optimizer uses range access.

范围获取方法使用一个索引来检索包含有一个或多个索引值间隔内的表中行的子集。它可以用于单列或多列索引。下面的部分描述了优化器使用量程访问的条件。

1、The Range Access Method for Single-Part Indexes.

2、The Range Access Method for Multiple-Part Indexes.

3、Equality Range Optimization of Many-Valued Comparisons.

4、Limiting Memory Use for Range Optimization.

5、Range Optimization of Row Constructor Expressions.

The Range Access Method for Single-Part Indexes（单列索引范围获取方法）

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the WHERE clause, denoted as range conditions rather than “intervals.”

对于单列索引，索引值间隔可以方便地用WHERE子句中的相应条件表示，表示为范围条件，而不是“间隔”。

The definition of a range condition for a single-part index is as follows:对于单列索引的范围条件定义如下：

（1）For both BTREE and HASH indexes, comparison of a key part with a constant value is a rangecondition when using the =, <=>, IN(), IS NULL, or IS NOT NULL operators.对于BTREE和散列索引来说，在使用=、<=>，IN（）、IS NULL，IS NOT NULL等操作符对键部分和常量值的比较时是一个范围条件。

（2）Additionally, for BTREE indexes, comparison of a key part with a constant value is a range condition when using the >, <, >=, <=, BETWEEN, !=, or <> operators, or LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character.此外，对于BTREE索引来说，在使用>、<、>=、<=、BETWEEN 、！=或<>，LIKE操作符将键部分与常量值进行比较是一个范围条件，这里的LIKE比较的参数必须是一个不以通配符开头的常量字符串。

For all index types, multiple range conditions combined with OR or AND form a range condition.“Constant value” in the preceding descriptions means one of the following:

对于所有的索引类型，用AND或者OR将多个范围条件结合形成一个范围条件。在前面的描述中，“常量值”是指以下内容之一：

（1）A constant from the query string（一个常量形式的查询字符串）

（2）A column of a const or system table from the same join。来自同一连接表的const列或系统表的列

（3）The result of an uncorrelated subquery。一个不相关子查询的结果

（4）Any expression composed entirely from subexpressions of the preceding types。任何完全由前一种类型的子表达式组成的表达式

Here are some examples of queries with range conditions in the WHERE clause:下面是一些在WHERE子句中具有范围条件的查询示例：

SELECT * FROM t1 WHERE key_col > 1 AND key_col < 10;
SELECT * FROM t1 WHERE key_col = 1 OR key_col IN (15,18,20);
SELECT * FROM t1 WHERE key_col LIKE 'ab%' OR key_col BETWEEN 'bar' AND 'foo';

Some nonconstant values may be converted to constants during the optimizer constant propagation phase.

在优化器恒定的传播阶段，一些非常量值可能被转换成常量。

MySQL tries to extract range conditions from the WHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

MySQL试图从每个可能的索引的WHERE子句中提取范围条件。在提取过程中，放弃了不能用于构造范围条件的条件，组合了重叠范围的条件，并消除了产生空范围的条件。（尽可能的去使用索引去建立范围条件，缩小区间）

Consider the following statement, where key1 is an indexed column and nonkey is not indexed:

考虑下面的语句，其中key1是一个索引列，而nonkey没有被索引：

SELECT * FROM t1 WHERE
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z');

The extraction process for key key1 is as follows:关键key1的提取过程如下：

1. Start with original WHERE clause:从最初的WHERE子句开始：

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
(key1 < 'bar' AND nonkey = 4) OR
(key1 < 'uux' AND key1 > 'z')

2. Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Replacing them with TRUE yields:移除非键nokey=4和key1 like “%b”，因为它们不能用于范围扫描。移除它们的正确方法是用TRUE替换它们，这样当进行范围扫描时，我们不会遗漏任何匹配的行：

(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
(key1 < 'bar' AND TRUE) OR
(key1 < 'uux' AND key1 > 'z')

3. Collapse conditions that are always true or false: 皱缩条件总是true或者false

• (key1 LIKE 'abcde%' OR TRUE) is always true

• (key1 < 'uux' AND key1 > 'z') is always false

Replacing these conditions with constants yields:用常数代替这些条件

(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

Removing unnecessary TRUE and FALSE constants yields:删除不必要的真和假常量：

(key1 < 'abc') OR (key1 < 'bar')

4. Combining overlapping intervals into one yields the final condition to be used for the range scan:将重叠的区间合并成一个，可以得到用于范围扫描的最终条件：

(key1 < 'bar')

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.

一般情况下（如前面的例子所示），范围扫描所使用的条件比WHERE子句的限制性更小。MySQL执行额外的检查以过滤出满足范围条件但不满足完整的WHERE子句的行。

The range condition extraction algorithm can handle nested AND/OR constructs of arbitrary depth, and

its output does not depend on the order in which conditions appear in WHERE clause.

范围条件提取算法可以处理任意深度的嵌套的AND/OR结构，其输出不依赖于WHERE子句中条件出现的顺序。

MySQL does not support merging multiple ranges for the range access method for spatial indexes. To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.

MySQL不支持为空间索引的范围访问方法合并多个范围。为了解决这个限制，您可以使用完全相同的SELECT语句的联合，您将每个空间谓词放在不同的SELECT语句中除外。（这个差那么点意思，理解不到位）

The Range Access Method for Multiple-Part Indexes（(聚族)多列索引范围获取方法）

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

聚族索引的范围条件是单列索引的范围条件的扩展。聚族索引的范围条件限制索引行位于一个或几个关键元组区间内。关键元组区间是定义在一组使用索引顺序的关键元祖集合之上的。

For example, consider a multiple-part index defined as key1(key_part1, key_part2, key_part3), and the following set of key tuples listed in key order:例如，考虑一个被定义为key1（keypart1、keypart2、keypart3）的聚族索引，以及下面列出的关键顺序的一系列关键元组：

key_part1 key_part2 key_part3
NULL       1         'abc'
NULL       1         'xyz'
NULL       2         'foo'
1          1         'abc'
1          1         'xyz'
1          2         'abc'
2          1         'aaa'

The condition key_part1 = 1 defines this interval:条件key_part1=1定义了这个区间：

(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)  inf表示无穷大

The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

该区间涵盖前一组数据集的第4、第5和第6个元组，可用于范围访问方法。（是由key_part1=1筛选出来）

By contrast, the condition key_part3 = 'abc' does not define a single interval and cannot be used by the range access method.

相比之下,条件key_part3 = ' abc '没有定义一个区间,不能使用的范围访问方法。

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

下面的描述说明了在更详细的情况下，聚族索引是如何工作的。

For HASH indexes, each interval containing identical values can be used. This means that the interval can be produced only for conditions in the following form:

对于HASH 索引，可以使用包含相同值的区间。这意味着该区间只能以下列形式的条件产生：

key_part1 cmp const1  
AND key_part2 cmp const2
AND ...
AND key_partN cmp constN;

Here, const1, const2, … are constants, cmp is one of the =, <=>, or IS NULL comparison operators, and the conditions cover all index parts. (That is, there are N conditions, one for each part of an N-part index.)在这里，const1，const2，是常量，cmp是一个=，<=>，或者 IS NULL 比较运算符等操作符中的一个，条件覆盖所有的索引列。（也就是说，有N个条件，一个是N列索引的每个部分。）

For example, the following is a range condition for a three-part HASH index:例如，下面是一个三列的HASH 索引的范围条件：

key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'

For a BTREE index, an interval might be usable for conditions combined with AND, where each condition compares a key part with a constant value using =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' (where 'pattern' does not start with a wildcard). An interval can be used as long as it is possible to determine a single key tuple containing all rows that match the condition (or two intervals if <> or != is used).

对于一个BTREE索引，一个区间可能适合于用AND进行条件组合，在每个条件都用 =, <=>, IS NULL, >, <, >=, <=, !=, <>, BETWEEN, or LIKE 'pattern' 将常量与key part比较（'pattern'不能是以通配符开头）。只要有可能确定一个包含与条件匹配的所有行（或使用<或！=）的单个键值元组，这个范围区间就会被使用。

The optimizer attempts to use additional key parts to determine the interval as long as the comparison operator is =, <=>, or IS NULL. If the operator is >, <, >=, <=, !=, <>, BETWEEN, or LIKE, the optimizer uses it but considers no more key parts. For the following expression, the optimizer uses = from the first comparison. It also uses >= from the second comparison but considers no further key parts and does not use the third comparison for interval construction:

如果比较操作符使用的是=、<=>或 IS NULL时优化器尝试使用额外的 key parts来确定区间。如果操作符是>、<、>=、<=、！=、<>、或BETWEEN、LIKE，优化器使用区间，但不考虑更多额外的 key parts。对于下面的表达式，优化器从第一次比较使用=。它也在第二次比较中使用使用了>=，但是没有考虑其他的 key parts，并且不使用第三次比较来进行区间结构的比较：

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10

The single interval is:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)

It is possible that the created interval contains more rows than the initial condition. For example,the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.创建的区间可能包含比初始条件更多的行。例如，前面的区间包含值（'foo'，11，0），它不满足原始条件（第三个条件时大于10，然而第三个值确实0）。

If conditions that cover sets of rows contained within intervals are combined with OR, they form a condition that covers a set of rows contained within the union of their intervals. If the conditions are combined with AND, they form a condition that covers a set of rows contained within the intersection of their intervals. For example, for this condition on a two-part index:

如果是通过OR结合的几个区间的集合，它们形成了一个条件，该条件涵盖了在它们的区间内包含所有行（并集）。如果条件是通过AND相结合，它们形成的一个条件，它涵盖了在它们的区间的交集中包含的那些行。（这个其实就是讲了一个OR和AND的区别）例如，对于two-part索引的这个条件：

(key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)

The intervals are:

(1,-inf) < (key_part1,key_part2) < (1,2)
(5,-inf) < (key_part1,key_part2)

In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.

在这个例子中，第一行的区间使用了左边绑定的一个key part 和右边界的两个 key parts 。第二行中的间隔只使用一个 key parts 。EXPLAIN输出中的key_len列表明所使用的键前缀的最大长度。

In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that key_part1 and key_part2 can be NULL. Then the key_len column displays two key part lengths for the following condition:

在某些情况下，key_len可能表示使用了一个 key par，但这可能不是您所期望的。假设keypart1和keypart2可以是NULL。以下条件key_len列显示的两个key part长度：

key_part1 >= 1 AND key_part2 < 2

But, in fact, the condition is converted to this:但是，事实上，这个条件被转换成这个：

key_part1 >= 1 AND key_part2 IS NOT NULL

The Range Access Method for Single-Part Indexes, describes how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index. Analogous steps are performed for range conditions on multiple-part indexes.

The Range Access Method for Single-Part Indexes，描述了如何执行优化，以组合或消除单列索引的范围条件的区间。对聚族索引的范围条件执行类似的步骤。

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where col_name is an indexed column:考虑一下这些表达式，col_name是一个索引列：

col_name IN(val1, ..., valN)
col_name = val1 OR ... OR col_name = valN

Each expression is true if col_name is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

如果col_name等于几个值中的任何一个值，那么这个表达式都是TRUE。这些比较和范围比较是相等的（其中“范围”是一个单独的值）。优化器估计为读取相等范围的比较中符合条件的行的成本如下：

1、If there is a unique index on col_name, the row estimate for each range is 1 because at most one row can have the given value.

如果col_name上有一个unique 索引，那么每个范围的行估计是1，因为最多一行可以符合给定的值。

2、Otherwise, any index on col_name is nonunique and the optimizer can estimate the row count for each range using dives into the index or index statistics.

否则，col_name上的任何索引都是非惟一的，并且优化器可以使用 index dives 或 index statistics.来估计每个范围的行数。

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression col_name IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

通过 index dives，优化器在一个范围的每一端进行 index dives，并使用范围内的行数作为估计，例如，表达式col_name IN (10, 20, 30) 有三个相等的范围，优化器在每个单值使用两次 index dives生成行评估。每一对index dives都可以估计出有给定值的行数。

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

Index dives提供精确的行估计，但是随着表达式中比较值的数量增加，优化器需要更长的时间来生成行估计。 index statistics的使用没有Index dives准确，但是能对大值列表进行更快的行估计。

The eq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other.To permit use of index dives for comparisons of up to N equality ranges, set eq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of N, set eq_range_index_dive_limit to 0.

eq_range_index_dive_limit 系统变量使您能够配置优化器从一行评估策略切换到另一行的值的数量。为了允许使用index dives来比较N个相等的ranges，将eq_range_index_dive_limit 设置为N+1。要禁用index statistics，并且总是使用index dives，而不考虑N，将eq_range_index_dive_limit 设置为0。

To update table index statistics for best estimates, use ANALYZE TABLE.

要更新表index statistics以获得最佳估计，请使用分析表。

Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness,except by using the eq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:

在MySQL 8.0之前，除了使用eq_range_index_dive_limit 系统变量之外，没有办法跳过index dives的使用来估计索引有用性。在MySQL 8.0中，对于满足所有这些条件的查询，index dives是可能的：

（1）The query is for a single table, not a join on multiple tables。查询只针对单个表，而不是多个表上的联接

（2）A single-index FORCE INDEX index hint is present. The idea is that if index use is forced, there is nothing to be gained from the additional overhead of performing dives into the index.一个单列索引FORCE INDEX指示提示存在。其思想是，如果索引使用是强制的，那么在索引中执行index dive的额外开销没有什么好处。

（3）The index is nonunique and not a FULLTEXT index.索引是非唯一的，而不是全文索引。

（4）No subquery is present.不存在子查询

（5）No DISTINCT, GROUP BY, or ORDER BY clause is present. 没有DISTINCT, GROUP BY, or ORDER BY 子句存在

For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:为了EXPLAIN FOR CONNECTION，如果跳过index dives，输出会发生如下变化：

（1）For traditional output, the rows and filtered values are NULL.对于传统输出，行和过滤值都是NULL。

（2）For JSON output, rows_examined_per_scan and rows_produced_per_join do not appear, skip_index_dive_due_to_force is true, and cost calculations are not accurate.对于JSON输出，rows_examined_per_scan 和rows_produced_per_join 不会登场，skip_index_dive_due_to_force 是true，成本计算是不准确的。

Without FOR CONNECTION, EXPLAIN output does not change when index dives are skipped.

如果没有连接，则在跳过 index dives时，EXPLAIN 输出不会改变。

After execution of a query for which index dives are skipped, the corresponding row in the INFORMATION_SCHEMA.OPTIMIZER_TRACE table contains an index_dives_for_range_access value of skipped_due_to_force_index.

Limiting Memory Use for Range Optimization

To control the memory available to the range optimizer, use the range_optimizer_max_mem_size system variable:

为了控制范围优化器可用的内存，请使用range_optimizer_max_mem_size系统变量：

（1）A value of 0 means “no limit.”值为0表示没有限制

（2）With a value greater than 0, the optimizer tracks the memory consumed when considering the range access method. If the specified limit is about to be exceeded, the range access method is abandoned and other methods, including a full table scan, are considered instead. This could be less optimal. If this happens, the following warning occurs (where N is the current range_optimizer_max_mem_size value):当值大于0时，优化器会跟踪考虑范围访问方法时所消耗的内存。如果要超出指定的限制，则放弃范围访问方法，并考虑其他方法，包括一个完整的表扫描。这可能不是最优的。如果发生这种情况，则会出现以下警告（WHERE N是当前的range_optimizer_max_mem_size 值）：

Warning 3170 Memory capacity of N bytes for 'range_optimizer_max_mem_size' exceeded. Range optimization was not done for this query.

For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing the range_optimizer_max_mem_size value may improve performance.对于超出可用范围优化内存的单个查询，以及优化器返回到不太理想的计划，增加range_optimizer_max_mem_size值可以提高性能。

To estimate the amount of memory needed to process a range expression, use these guidelines:要估计处理范围表达式所需的内存数量，请使用以下指南：

（1）For a simple query such as the following, where there is one candidate key for the range access method, each predicate combined with OR uses approximately 230 bytes:对于一个简单的查询，如下面的查询，其中有一个候选 key用于范围访问方法，每个谓词用OR结合使用大约230个字节：

SELECT COUNT(*) FROM t WHERE a=1 OR a=2 OR a=3 OR .. . a=N;

（2）Similarly for a query such as the following, each predicate combined with AND uses approximately 125 bytes:类似于下面的查询，每个谓词通过AND结合使用大约125个字节：

SELECT COUNT(*) FROM t WHERE a=1 AND b=1 AND c=1 ... N

（3） For a query with IN() predicates: 对于IN（）谓词的查询：

SELECT COUNT(*) FROM t WHERE a IN (1,2, ..., M) AND b IN (1,2, ..., N);

Each literal value in an IN() list counts as a predicate combined with OR. If there are two IN() lists, the number of predicates combined with OR is the product of the number of literal values in each list. Thus, the number of predicates combined with OR in the preceding case is M × N.

IN（）列表中的每一个文字值都算作一个谓词用OR结合。如果有两个IN（）列表，那么谓词的数量与每个列表中的文字值的数量用OR相结合。因此，与前一种情况相结合的谓词的数量是M *N。

Range Optimization of Row Constructor Expressions

The optimizer is able to apply the range scan access method to queries of this form:优化器能够将范围扫描访问方法应用于此格式的查询：

SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

Previously, for range scans to be used, it was necessary to write the query as:在此之前，要使用范围扫描，有必要将查询写成：

SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );

For the optimizer to use a range scan, queries must satisfy these conditions:为了让优化器使用范围扫描，查询必须满足以下条件：

（1）Only IN() predicates are used, not NOT IN().只有IN（）谓词被使用，而不能有NOT IN（）。

（2）On the left side of the IN() predicate, the row constructor contains only column references.在IN（）谓词的左边，row构造函数只包含列引用。（column(左边只能是字段名) in (value)）

（3）On the right side of the IN() predicate, row constructors contain only runtime constants, which are either literals or local column references that are bound to constants during execution.在IN（）谓词的右边，row构造函数只包含运行时常量，它们要么是字面量，要么是它们在执行期间绑定到常量的本地列引用。(column(左边只能是字段名) in (value 右边可以是一个const值吗，或者子查询的某一列的值，相当于常量))

（4）On the right side of the IN() predicate, there is more than one row constructor.在IN（）谓词的右边，有不止一个row构造函数。（col in (const1, const2) ,如果只有一个直接用 = 操作符，所以至少要有两个）

希望有同学，大佬帮我指正理解不到位的部分，在此表示感谢！

上一篇：https://blog.csdn.net/qwerdf10010/article/details/80514301