SQL Server index fragmentation and fill factor

Original text from: http://www.cnblogs.com/CareySon/archive/2012/01/06/2313897.html

Index fragments are divided into internal and external.

First of all, understanding the "outside" of external fragments is relative to the page. External fragmentation refers to fragmentation due to paging. For example, I want to insert a row in an existing clustered index, which just causes the existing page space to be insufficient to accommodate new rows. This leads to paging:

   1

     Because in SQL SERVER, new pages are continuously generated as the data grows, and the clustered index requires continuous between rows, so in many cases after paging and the original page is not continuous on disk.

     This is called external fragmentation.

     Because paging will cause data to move between pages, if inserting updates and other operations often need to cause paging, it will greatly increase IO consumption and cause performance degradation.

     For searching, when there are specific search conditions, such as where clauses have very narrow restrictions or return unordered result sets, external fragmentation does not affect performance. However, if you want to return to scan the clustered index and find consecutive pages, external fragmentation will have a performance impact.

     In SQL Server, the unit larger than the page is the area (Extent). One area can hold 8 pages. The area is used as the physical unit of disk allocation. Therefore, when the page is split if it crosses the area, it needs to be divided multiple times. More scans are required. Because continuous data cannot be read ahead, which causes additional physical reads and increases disk IO.

 

Understand internal fragments


    Like the external fragments, the "inner" of the internal fragments is also relative to the page. Let's look at an example:

    2

    We create a table, each row of this table consists of int (4 bytes), char (999 bytes) and varchar (0 bytes), so each row is 1003 bytes, then 8 rows occupy space 1003 * 8 = 8024 bytes plus some internal overhead can be accommodated in a page:

    3

    When we randomly update the col3 field in a row, the new data cannot be accommodated in the page, resulting in paging:

    4

   Schematic diagram after paging:

   5

    When paging, if the new page and the current page are physically discontinued, it will also cause external fragmentation

Impact of internal and external fragmentation on query performance


    外部碎片对于性能的影响上面说过,主要是在于需要进行更多的跨区扫描,从而造成更多的IO操作.

    而内部碎片会造成数据行分布在更多的页中,从而加重了扫描的页树,也会降低查询性能.

使用命令查看索引碎片信息:dbcc showcontig('[tablename]')

DBCC SHOWCONTIG 正在扫描 'ProductCostHistory' 表...
表: 'ProductCostHistory' (114099447);索引 ID: 1,数据库 ID: 15
已执行 TABLE 级别的扫描。
- 扫描页数................................: 3
- 扫描区数..............................: 2
- 区切换次数..............................: 1
- 每个区的平均页数........................: 1.5
- 扫描密度 [最佳计数:实际计数].......: 50.00% [1:2]
- 逻辑扫描碎片 ..................: 66.67%
- 区扫描碎片 ..................: 50.00%
- 每页的平均可用字节数.....................: 2171.0
- 平均页密度(满).....................: 73.18%
DBCC 执行完毕。如果 DBCC 输出了错误信息,请与系统管理员联系。

由上我们看出,逻辑扫描碎片和扩展盘区扫描碎片都非常大,果真需要对索引碎片进行处理了。

一般有两种方法解决,一是利用DBCC INDEXDEFRAG整理索引碎片,二是利用DBCC DBREINDEX重建索引。二者各有优缺点。调用微软的原话如下:
DBCC INDEXDEFRAG 命令是联机操作,所以索引只有在该命令正在运行时才可用。而且可以在不丢失已完成工作的情况下中断该操作。这种方法的缺点是在重新组织数据方面没有聚集索引的除去/重新创建操作有效。

重新创建聚集索引将对数据进行重新组织,其结果是使数据页填满。填满程度可以使用 FILLFACTOR 选项进行配置。这种方法的缺点是索引在除去/重新创建周期内为脱机状态,并且操作属原子级。如果中断索引创建,则不会重新创建该索引。

也就是说,要想获得好的效果,还是得用重建索引,所以决定重建索引。
DBCC DBREINDEX(表,索引名,填充因子)
第一个参数,可以是表名,也可以是表ID。
第二个参数,如果是'',表示影响该表的所有索引。
第三个参数,填充因子,即索引页的数据填充程度。如果是100,表示每一个索引页都全部填满,此时select效率最高,但以后要插入索引时,就得移动后面的所有页,效率很低。如果是0,表示使用先前的填充因子值。

语法如:dbcc dbreindex('tablename','',0)

参数:‘’  :代表整个表所有索引,也可以指定某个索引。

0:代表DBCC DBREINDEX 在创建索引时将使用指定的起始 fillfactor。


DBCC SHOWCONTIG是显示指定的表的数据和索引的碎片信息。

解释如下:

Page Scanned-扫描页数:如果你知道行的近似尺寸和表或索引里的行数,那么你可以估计出索引里的页数。看看扫描页数,如果明显比你估计的页数要高,说明存在内部碎片。 

Extents Scanned-扫描扩展盘区数:用扫描页数除以8,四舍五入到下一个最高值。该值应该和DBCC SHOWCONTIG返回的扫描扩展盘区数一致。如果DBCC SHOWCONTIG返回的数高,说明存在外部碎片。碎片的严重程度依赖于刚才显示的值比估计值高多少。 

Extent Switches-扩展盘区开关数:该数应该等于扫描扩展盘区数减1。高了则说明有外部碎片。 

Avg. Pages per Extent-每个扩展盘区上的平均页数:该数是扫描页数除以扫描扩展盘区数,一般是8。小于8说明有外部碎片。 

Scan Density [Best Count:Actual Count]-扫描密度[最佳值:实际值]:DBCC SHOWCONTIG返回最有用的一个百分比。这是扩展盘区的最佳值和实际值的比率。该百分比应该尽可能靠近100%。低了则说明有外部碎片。

Logical Scan Fragmentation-逻辑扫描碎片:无序页的百分比。该百分比应该在0%到10%之间,高了则说明有外部碎片。 

Extent Scan Fragmentation-扩展盘区扫描碎片:无序扩展盘区在扫描索引叶级页中所占的百分比。该百分比应该是0%,高了则说明有外部碎片。 

Avg. Bytes Free per Page-每页上的平均可用字节数:所扫描的页上的平均可用字节数。越高说明有内部碎片,不过在你用这个数字决定是否有内部碎片之前,应该考虑fill factor(填充因子)。 

Avg. Page Density (full)-平均页密度(完整):每页上的平均可用字节数的百分比的相反数。低的百分比说明有内部碎片

理解填充因子


      Rebuilding the index can certainly solve the problem of fragmentation. But the cost of rebuilding the index is not only trouble but also blocking. Affect the use. And for the case of less data, the index rebuilding cost is not great. And when the index itself exceeds 100 megabytes. The time to rebuild the index will be very painful.

      This is exactly what the fill factor does. For the default value, the fill factor is 0 (0 and 100 represent a concept), which means that the page can be used 100%. So when you encounter the previous update or insert, insufficient space leads to paging. By setting the fill factor, you can set the degree of page usage:

     9

      Let's look at an example:

      Still the above table. I inserted 31 data, it occupies 4 pages:

      10

     By setting the fill factor, the page is set to 5 pages:

     11

     At this time I insert another page without pagination:

     12

     The above concept can be explained as follows:

     13

      It can be seen that using the fill factor will reduce the number of page breaks when updating or inserting, but because more pages are needed, the search performance will be correspondingly lost.

     

How to set the value of the fill factor


    There is no formula or idea to set the fill factor value accurately. Although the use of fill factor can reduce paging when updating or inserting, at the same time, because more pages are needed, the performance of the query is reduced and more disk space is occupied. How to set this value for trade-off needs to be based on specific circumstances Look.

    The specific situation is based on the ratio of reading and writing to the table.I give the value I think is more appropriate here:

    1. When the read-write ratio is greater than 100: 1, do not set the fill factor, 100% fill

    2. When the number of writes is greater than the number of reads, set 50% -70% fill

    3. When the read-write ratio is between the two, 80% -90% fill

    The above data is just my opinion, the specific setting data must be tested according to specific conditions to find the best.


Published 22 original articles · praised 7 · 100,000+ views

Guess you like

Origin blog.csdn.net/qyx0714/article/details/77964460