For index storage, since 2008, there are two optimization methods, both of which reduce the storage space by removing duplicate data, so that the original storage space is reduced. Less space means fewer pages, and fewer pages means fewer I/O requests during a query. Row compression and page compression, respectively
1. Row Compression
The first: reduce the volume of the row. Row compression achieves its purpose by changing the storage form of rows. It can be used on heap or B_Tree. When row compression is enabled, the corresponding function will be enabled.
row of raw data for this table
Fixed-length data will be stored in variable-length format
Numeric data types are also stored in variable-length format
--Create two test tables for comparison
IF OBJECT_ID('dbo.NoCompression') IS NOT NULL DROP TABLE dbo.NoCompression IF OBJECT_ID('dbo.RowCompression') IS NOT NULL DROP TABLE dbo.RowCompression SELECT SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID, UnitPrice , UnitPriceDiscount , LineTotal , rowguid , ModifiedDate INTO dbo.NoCompression FROM Sales.SalesOrderDetail SELECT SalesOrderID , SalesOrderDetailID , CarrierTrackingNumber , OrderQty , ProductID , SpecialOfferID, UnitPrice , UnitPriceDiscount , LineTotal , rowguid , ModifiedDate INTO dbo.RowCompression FROM Sales.SalesOrderDetail
Compression is implemented on the DATA_COMPRESSION of a CREATE or ALTER INDEX statement, and compression can be used on both clustered and non-clustered indexes.
The following statement is row compression, about 33% of the space is compressed
-- no compression CREATE CLUSTERED INDEX CLIX_NoCompression ON dbo.NoCompression (SalesOrderID, SalesOrderDetailID); --line compression CREATE CLUSTERED INDEX CLIX_RowCompression ON dbo.RowCompression (SalesOrderID, SalesOrderDetailID) WITH (DATA_COMPRESSION = ROW); --check page used SELECT OBJECT_NAME(object_id) AS table_name , in_row_reserved_page_count FROM sys.dm_db_partition_stats WHERE object_id IN ( OBJECT_ID('dbo.NoCompression'), OBJECT_ID('dbo.RowCompression') )
Number of pages with and without row compression:
Compression not only reduces storage space, but also improves query performance by reducing data pages.
SET STATISTICS IO ON SELECT SalesOrderID,SalesOrderDetailID , CarrierTrackingNumber FROM dbo.NoCompression WHERE salesorderID BETWEEN 51500 AND 5200 SELECT SalesOrderID,SalesOrderDetailID , CarrierTrackingNumber FROM dbo.RowCompression WHERE salesorderID BETWEEN 51500 AND 5200 SET STATISTICS IO OFF
When compressing, the following needs to be considered
1. The premise of compression is to operate on large tables.
2. If the largest function exceeds 8060bytes, the compression cannot be performed
3. Non-clustered indexes do not inherit compressed positions on heap or clustered indexes. Each needs to be done manually
4. High-frequency CPU overhead operations during compression cannot be performed frequently
two. page compression
Page compression can also be done in heap and B-Tree structures. Page compression is usually more efficient than row compression because it includes row compression, prefix compression, and dictionary compression.
Before page compression, row compression is performed first, and then data with the same prefix in the page is compressed.
--Create test table:
IF OBJECT_ID('dbo.PageCompression') IS NOT NULL DROP TABLE dbo.PageCompression SELECT SalesOrderID ,SalesOrderDetailID ,CarrierTrackingNumber ,OrderQty ,ProductID , SpecialOfferID ,UnitPrice ,UnitPriceDiscount ,LineTotal ,rowguid ,ModifiedDate INTO dbo.PageCompression FROM Sales.SalesOrderDetail
To compress:
CREATE CLUSTERED INDEX CLIX_PageCompression ON dbo.PageCompression (SalesOrderID, SalesOrderDetailID) WITH (DATA_COMPRESSION = PAGE); SELECT OBJECT_NAME(object_id) AS table_name , in_row_reserved_page_count FROM sys.dm_db_partition_stats WHERE object_id IN ( OBJECT_ID('dbo.NoCompression'), OBJECT_ID('dbo.PageCompression') )
Execute the query:
SET STATISTICS IO ON SELECT SalesOrderID,SalesOrderDetailID ,CarrierTrackingNumber FROM dbo.PageCompression WHERE SalesOrderID BETWEEN 51500 AND 5200
SET STATISTICS IO OFF
Indexed view:
Due to permissions, the query may not return a lot of data. At this time, the view may be a candidate solution. For static data that is only queried, creating an indexed view is also a good solution.
Without indexed views
SET STATISTICS IO ON SELECT psc.Name, SUM(sod.LineTotal) AS SumLIneTotal, SUM(sod.OrderQty) AS SumOrderQty, AVG(sod.UnitPrice) AS AvgUnitPrice FROM Sales.SalesOrderDetail sod INNER JOIN Production.Product p ON sod.ProductID=p.ProductID INNER JOIN Production.ProductSubcategory psc ON p.ProductSubcategoryID=psc.ProductSubcategoryID GROUP BY psc.Name ORDER BY psc.Name
Create an indexed view:
CREATE VIEW dbo.ProductSubcategorySummry -- used to create indexed views WITH SCHEMABINDING AS SELECT psc.Name, SUM(sod.LineTotal) AS SumLIneTotal, SUM(sod.OrderQty) AS SumOrderQty, AVG(sod.UnitPrice) AS AvgUnitPrice FROM Sales.SalesOrderDetail sod INNER JOIN Production.Product p ON sod.ProductID =p.ProductID INNER JOIN production.ProductSubcategory psc ON p.ProductSubcategoryID=psc.ProductSubcategoryID GROUP BY psc.Name;
-- create a clustered index CREATE UNIQUE CLUSTERED INDEX CLIX_ProductSubcategorySummay ON dbo.ProductSubcategorySummry(Name) SET STATISTICS IO ON SELECT name,SumLineTotal,SumOrderQty,TotalUnitPrice/Occurances AS AvgUnitPrice FROM dbo.ProductSubcategorySummry ORDER BY name
Logical reads dropped a lot after using indexed views
Indexed views are very effective when multiple tables need to be associated into a unit, which can reduce IO requests when associated
Restrictions for indexed views:
1. All columns in the view must be deterministic
2. Indexed views must use the SCHEMA_BINDING option
3. Clustered indexes must use the unique option
4. The referenced table must have a schema name with
5. Some summary functions, such as AVG(), cannot be used for indexed views