Composite Index Considerations

The principle and function of indexing, the introduction of various books and the Internet can be said to be overwhelming, and basically the mainstream database systems are also consistent. The principles for selecting index fields, such as foreign key fields, fields with smaller data types, fields often used for query or sorting, fields associated with tables, etc., will not be repeated here. I have seen indexes created by many people in my work, and recalling that I also had the empty experience of theoretical knowledge in the past, I always felt that theoretical knowledge could not match the specific work problems. Here, only a little experience and problem scenarios accumulated in work and study are arranged for readers. Here are some common things to keep in mind:

  1. The index should be built on a field with high selectivity (number of records with unique key value/total number of records).
  2. The order of the fields in the composite index, the fields with higher selectivity are ranked first;
  3. When the where condition contains two highly selective fields, you can consider creating indexes separately, and the engine will use both indexes at the same time (in the OR condition, it should be said that indexes must be created separately);
  4. Do not repeatedly create indexes that contain each other, such as index1(a,b,c), index2(a,b), index3(a);
  5. The combined index should not have too many fields. If there are more than 4 fields, it is generally necessary to consider splitting it into multiple single-column indexes or simpler combined indexes;

Last but not least, don't abuse the index. Because too many indexes will not only increase the overhead of physical storage, but also increase the processing overhead for insert, delete, and update operations, and increase the calculation cost of the optimizer when selecting indexes.

So too many indexes and insufficient, incorrect indexes are not good for performance. In a word, the establishment of the index must be careful, the necessity of each index should be carefully analyzed, and there must be a basis for establishment.

Taking an example of a scenario below, is it valid to create such an index?

select  *
from    t1, t2
where   t1.col_1 = t2.ab and t1.col_2 in (12, 38);

--Create an index as follows 
create  index idx_t1_query on t1(col_1, col_2);
-- or just create the index as follows
create index idx_t1_col2 on t1(col_2);

For another example, the most commonly used SQL scenarios for this table have the following two types. How should an index be created?

select  *
from    t1
where   t1.PartId = 'xxxx' and t1.STATE = 2 and t1.PROCID = 'yyyy'

select  *
from    t1
where   (t.PartId = 'xxxx' or t1.ActualPartId = 'xxxx' ) and t1.STATE = 2 and t1.PROCID = 'yyyy'


-- 创建一个“全覆盖的索引”,把查询条件都包含的索引
create index idx_t1_query on t1(partId, actualpartId, state, procid);

-- 还是分开创建如下两个索引
create index idx_t1_PartId on t1(partId, state, procid)
create index idx_t1_actualPartId on t1(actualpartId, state, procid)

以执行计划和逻辑IO的统计数据显示,两个场景的测试结果都是后者索引有明显的效果,大家有兴趣可以自己测试验证一下。当然,生产环境远比这些要复杂,各表的数据量及数据分布情况也会影响引擎的执行方式,引擎对索引选择与要求也会不一样,此处仅以简单语句做示例进行说明。

组合索引查询的各种场景:

组合索引 Index (A, B, C)

  • 下面条件可以用上该组合索引查询:
    • A>5
    • A=5 AND B>6
    • A=5 AND B=6 AND C=7
    • A=5 AND B=6 AND C IN (2, 3)
  • 下面条件将不能用上组合索引查询:
    • B>5                                           ——查询条件不包含组合索引首列字段
    • B=6 AND C=7                            ——理由同上
  • 下面条件将能用上部分组合索引查询:
    • A>5 AND B=2                            ——当范围查询使用第一列,查询条件仅仅能使用第一列
    • A=5 AND B>6 AND C=2             ——范围查询使用第二列,查询条件仅仅能使用前二列
    • A=5 AND B IN (2, 3) AND C=2   ——理由同上

组合索引排序的各种场景:

组合索引 Index(A, B)

  • 下面条件可以用上组合索引排序:
    • ORDER BY A                   ——首列排序
    • A=5 ORDER BY B            ——第一列过滤后第二列排序
    • ORDER BY A DESC , B DESC      ——注意,此时两列以相同顺序排序
    • A>5 ORDER BY A            ——数据检索和排序都在第一列
  • 下面条件不能用上组合索引排序:
    • ORDER BY B                   ——排序在索引的第二列
    • A>5 ORDER BY B            ——范围查询在第一列,排序在第二列
    • A IN(1,2) ORDER BY B    ——理由同上
    • ORDER BY A ASC , B DESC        ——注意,此时两列以不同顺序排序

索引合并的简单说明:

  • 数据库能同时使用多个索引
    • SELECT * FROM TB WHERE A=5 AND B=6
      • 能分别使用索引(A) 和 (B);
      • 对于这个语句来说,创建组合索引(A,B) 更好;
      • 最终是采用组合索引,还是两个单列索引?主要取决于应用系统中是否存在这类语句:SELECT * FROM TB WHERE B=6
    • SELECT * FROM TB WHERE A=5 OR B=6
      • 组合索引(A, B)不能用于此查询(目前的数据库也很智能,部分OR条件也能够使用组合索引,但效果不是很稳定);
      • 很明显,分别创建索引(A) 和 (B)会更好;
  • 删除无效的冗余索引
    • TB表有两个索引(A, B) 和 (A),对应两种SQL语句:SELECT * FROM TB WHERE A=5 AND B=6 和 SELECT * FROM TB WHERE A=5
      • 执行时,并不是WHERE A=5 就用 (A); WHERE A=5 AND B=6  就用 (A, B);
      • 其查询优化器会使用其中一个以前常用索引,要么都用(A, B), 要么都用 (A)。
      • 所以应该删除索引(A),它已经被(A, B)包含了,没有任何存在的必要。

附,查询指定数据表的索引定义情况:

--Sqlserver:
sp_helpindex 'tableName'
--或者
select  t2.name tabName, t3.name indName, t4.name colName, t1.*
from	sys.index_columns t1
	join sys.tables t2 on t1.object_id = t2.object_id
	join sys.indexes t3 on t2.object_id = t3.object_id and t1.index_id = t3.index_id
	join sys.columns t4 on t2.object_id = t4.object_id and t1.column_id = t4.column_id
where	t2.name = 'tableName'
order by t3.name, t1.index_column_id

--Oracle:
select  * 
from    user_ind_columns a 
where   a.TABLE_NAME = upper('tableName') 
order by a.INDEX_NAME, a.COLUMN_POSITION;

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326377395&siteId=291194637