SQL Server IO subsystem shallow study I

Author: obuntu

liked the four-step theory locate performance problems inside SQL Server, as follows:
1, resource bottlenecks
... I iii IO the CPU memory ii
2, Tempdb bottlenecks
3, identify slow implementation of statements by three to find
i. statistical information ii. missing indexes iii. blocking
4, cached execution plan analysis

more information can be found in this article http://blogs.msdn.com/b/jimmymay/archive/2008/09/01/ sql-server-performance-troubleshooting-

methodology.aspx can see, when a system is experiencing performance problems, the first step is to determine whether the resource bottleneck between the CPU, memory, IO three, is most likely to form a bottleneck IO subsystem. In fact connotation IO subsystem is deep, the IO subsystem performance factors that can affect the number of disks have, size, and speed; file allocation unit size (file allocation unit size); HBA ; network bandwidth; disk cache; a controller ; whether SAN (storage area networks); RAID level; bus speed; the IO channel and the like.
As a user of SQL Server, usually rarely go to adjust the configuration of IO subsystem, an inadequate attention, the second is the lack of relevant knowledge and skills in this area. But understanding the issues related to this area is still very necessary, in addition to better play the role of external hardware, when confronted with system performance issues, can be very good for location analysis.

Contents
IO subsystem concepts
SQL Server IO concepts
Performance Monitor counters in the IO subsystem
SQLIO
on some of the best practices of IO
Summary
References


IO subsystem concepts

A, disk

disk from the past few decades, has made rapid development, from ATA, SATA, SAS, now the SSD, every technological changes have brought a performance of the disk. Now the most widely used should be 15K rpm SAS disks, and for such discs, some traditional concepts or the same disk, such as track, sector.

Now the hard drive, generally by a set of overlapping disks, each disk are divided into an equal number of tracks, while the tracks are numbered. Each track is divided into a plurality of arcs, the arc is a sector, which is generally 512bytes, there are 1K, 2K, 4K size. At the same time on different disk, the disk is composed of the same number of cylinders. Cylinder number equal to the number of tracks, the number of the disk equal to the total number of heads, and therefore the concept of the so-called hard disk CHS, i.e. Cylinder (cylinder), Head (head), Sector (sectors). Size equal to the size of the disk head count * * * sector numbers of sectors of cylinders.

IO mode sequential read and random write, the disk when processing these two kinds of reading and writing, demonstrated performance is not the same. In general, in order to read and write, and now 10K rpm disks, can be obtained 40mb / s ~ 80mb / s transfer rate; 15K rpm disks, can be obtained 70mb / s ~ 125mb / s transfer rate. For random access, the performance depends on the speed and the seek time of the disk. A 10K rpm disk rotation required to complete a full 6ms (1 * 60 * 1000/ 10000). Hard disk latency, also known as latency (Latency), refers to the head has to be accessed in the track, waiting for a sector to be accessed to the rotation time of the lower head. The average waiting time is half the time required for one rotation of the disc, should generally be in 4ms or less, it is generally believed that the waiting time on disk as 3ms, for 15K disk is 2ms.

There factor affecting performance is the disk seek time, it refers to the system hard disk after receiving the instruction, the head of the average time it takes to move from the start to move to the track where the data is. Now 10K rpm disks on reading, the average seek time of 4.6ms, in writing, the average seek time of 5.2ms. A 15K rpm disk average read seek time of 3.5ms, write seek time of 4.2ms.

As for the small 8kb reading, transmission time is about 0.1ms, ignoring other factors can be ignored, the combination of the above discussion, for random reads, we can arrive at a total delay of around 8ms (10K disk) and 5.6 ms (15K rpm disks), the disk pieces is generally random read performance approximately 125 IOPS (10K disk) and 175 IOPS (15K disk).

The above situation is ideal. If the data on the disk concentrated in a small area of a block, it will reduce the average seek time of the disk, performance will be better. However, if multiple IO requests occur simultaneously, then multiple disk IO also need to be serialized, sorting, so that in higher throughput, its delay will be longer. In general, if data is distributed in the entire disk, the higher the queue depth (queue depth), the delay is longer, the queue depth is 4, the delay reaches 20ms, the queue depth is 32, the delay can achieve 100ms. Queue depth refers to the number of IO disk can run in parallel. Thus, the depth value of the queue 2 is recommended, of course, different storage, different systems will have different suggested values can be provided when the relevant reference information. But there is a case to note that if the data is distributed only to a small area on the disk, such as 5%, then the delay will not increase sharply with increasing depth of the queue, the queue depth is generally 8 20ms, queue depth is 16 40ms, and random read performance is also greatly improved, each IOPS can reach 400. this feature allows the processing of transactions you get a lot of strong flexible space.

B, RAID

Now the real business environments rarely use a single disk for files, instead of using RAID technology. RAID technology can bring to enhance the performance and effective fault tolerance. Simply put, RAID is a plurality of the separate drives (physical disk) combined in different ways to form a disk group (logical drive), thereby providing a higher performance than a single disk storage technology and provide data backup. Disk array composed of different ways called RAID levels (RAID Levels).

There are many levels of RAID, which means that the disk has a variety of combinations. Is now more commonly used RAID10 and RAID5, RAID10 overall performance is more than RAID5, but its price is also more expensive. Decide which RAID level to use, the impact on system performance is also great, well-tested and therefore need to weigh their own situation and make a choice.

C, and the concept of

file allocation unit (file allocation unit) size, cluster size is generally an integral multiple of the sector size, such as the cluster size is 4K, the size of the sector is 512bytes, then a cluster will be used to 8 sectors. During disk formatting, use the format command / A: size option to specify. Typically, on the SQL Server data files and log files more appropriate size is 64K, 32K but sometimes can also provide better performance, set this value before, at best adequate testing to decide. The following is an example of this view of the file allocation unit size.
C: \ Documents and Settings \ Administrator> the fsutil the FSINFO ntfsinfo d:
NTFS Volume Serial Number: 0xde500ef9500ed7e3
Version: 3.1
Number of District: 0x0000000012c03620
cluster total: 0x0000000012c03620
Units available: 0x000000001098efb6
retention Total: 0x0000000000000000
number of bytes per sector: 512
Number of bytes per cluster: 512
bytes of each segment FileRecord: 1024
number of clusters per segment FileRecord: 2
Mft valid data length: 0x0000000004a68000
Mft start of Lcn: 0x0000000000600000
Mft2 start of Lcn: 0x0000000009601b10
Mft area start: 0x0000000000625460
Mft end region: 0x0000000002b80800

configuring RAID time, there can be manually set parameters:. Stripe size logical drives Stripe size, each representative of the controller times the amount of data written to one physical disk in KB. Select a different Stripe size directly affects performance, such as throughput and IOPS. Stripe size small value, the plurality of I / O requests by multiple disks in response, may be increased I / O access rate (IOPS); Stripe size large value, the response by a plurality of disk I / O request, the data transfer rate can be increased ( mbps). in order to achieve higher performance, the capacity to select the strip size is less than or equal to the cluster operating system. Large-capacity tape produces higher read performance, especially when continuous data reading. Random data is read when the tape is preferably set smaller capacity.

Therefore, you can see above is worth setting the performance of SQL Server is also helpful, but it is difficult to have an appropriate recommendation, and sometimes most still keep the default, if you do encounter set this need, it is best to consult related products manufacturers, or fully test yourself.


SQL Server IO concepts

SQL Server disk IO engine has its own internal management mechanism. Understanding of SQL Server IO processing mechanism is necessary. Microsoft has two very good white paper, called "SQL Server I / O Basics Chapter 1" "SQL Server I / O Basics Chapter 2", this in-depth elaboration, if friends are interested in this area, is not to be missed . But only in English, the two add up to about 100 pages. Here some of the main points of SQL Server IO briefly explained, for more details, refer to these two white paper now.

Ahead Logging the Write (WAL) Protocol

SQL Server when writing data files, you need to advance the contents of the log file is written to the transaction log file on disk, which is WAL mechanism. This mechanism can protect and cure the transaction carried out. The only way to achieve transaction durability characteristics. SQL Server is achieved through the use of FILE_FLAG_WRITE_THROUGH WAL mechanism to identify Createfile to achieve.

Synchronous vs Asynchronous I / O

synchronous I / O refers to the I / O API will wait for the I / O request is completed before the next process; asynchronous I / O refers to the I / O API by issuing I / O requests and then proceed with the other content, and to look back after a while the I / O has been completed.

The SQL Server 98% using asynchronous I / O, which allows to continue to effectively use SQL Server CPU and other resources after the write or read a page. Windows platform handles asynchronous I / O is the use of the OVERLAPPED structure to hold the associated I / O information and use HasOverlappedIOCompleted to identify I / O has been completed. After SQL Server 2005 introduces sys.dm_io_pending_io_requests this dynamic management view, which corresponds to the IO_PENDING columns HasOverlappedIOCompleted.

Scatter / Gather I / O

in the previous SQL Server 2000, when the checkpoint you want SQL Server buffer pool of dirty data pages to disk, you need to maintain a list of dirty data pages, and then in order to write dirty data pages, so if encountered a page I / O trouble writing, it will lead to decreased performance of the entire checkpoint. So after SQL Server 2000 introduces Scatter / Gather I / O way, Scatter refers to the disk to read the data page from memory, without memory allocation consecutive pages, pages in the buffer pool can be distributed in different places, through ReadFileScatter call this API to achieve; Gather refers to the kind of dirty data pages list before when writing data to disk pages from memory, having to maintain, but after scanning the entire buffer pool, directly to a dirty data pages to disk continuous block region WriteFileGather implemented by calling this API. Clearly we can see that this approach is more effective, not only in the SQL I / O path, also used in the Page File.

Sector alignment, Block Alignment

In SQL Server which, when written to the transaction log, not according to the size of the page (8KB) to write, but according to the size of the sector to be written. Has been adopted to prevent the sector to write the transaction log is being re-write, causing damage to the transaction log. In the sector will maintain a parity bit, when written to the log file, by checking the check bits to determine whether the sector can be written to the log, so as to ensure the effectiveness of the log.

In fact, the sector size for the user is transparent, that is, SQL Server will automatically make the relevant processing according to the sector size of the disk, for example, when the sector is 1024bytes 512bytes from one sector to restore the subsequent log write is in accordance with 1024bytes up.

Since the minimum unit block is 8KB, and because in a disk, the default case where the first 63 sectors of hidden sectors for storing MBR (Master Boot Information), which is hidden sector size 31.5KB. This thing called partition offset, if the setting is not effective, causes additional I / O is generated, thereby affecting performance. This issue, we will conduct a detailed follow-up show.

In general, to determine the appropriate size of the sector, may be performed by a calculation formula, ((Partition offset) * ( Disk sector size)) / (Stripe unit size), ensure that the result is an integer. For example, in a case where the stripe size of 256, after the offset at least 512 sectors, in order to ensure the formula is an integer result, so that at least the size of 256KB to set the offset.
(512 * 63) / 262 144 = 0.123046875
(512 * 64) / 0.125 = 262144
(128 * 512) / 0.25 = 262144
(256 * 512) / 0.5 = 262144
(512 * 512) / 1 = 262144

To view the sector size of a document may also be used dbcc fileheader ( 'dbname') view.

And A Page Latching: Walk-through A Read

Latch, is a kind of lightweight locks used to protect various system resources, in the I / O page is used to protect data in memory, to ensure data consistency. In SQL Server, there are aspects of the latch IO class 2, one is PAGE_IO * _LATCH, one is PAGE * _LATCH, these two types of wait types can be used to locate the problem I / O and memory area. At the same time as the lock, latch has SH (shared) and EX (exclusive) such properties.

PAGE_IO * _LATCH time for reading or writing Page, if the duration is too long to read and write, then such would be obvious to wait. For example, when reading a page from the physical file, a latch will request EX until the reading is completed after release, so that we can ensure that the read process will not be other modifications. The PAGE * _LATCH the page is already in memory are added to latch, latch is only added when needed. SH type of latch does not block the SH type latch, but will block EX type of latch.

At the same time, you need to pay attention to latch only occur in user mode, kernel mode in the competition for resources management is responsible for the SQLOS,

On the latch, in order to reduce competition (hot page) hotspot page, SQL Server also introduces a sub-latch mechanism. Sub-latch occurs only in the already existing memory pages. For example, when SQL Server detects a sustained period of time, have a high latch SH behavior occurs, an upgrade is already held by the latch sub-latch, sub-latch is a latch into multiple logical CPU in accordance with a latch structure corresponding queue, so that a worker only needs to request the SH sub-latch for the local scheduler to avoid chain activity, but also uses less resources, improve the ability of handling hot page. Of course, all this is SQL Server occurs automatically, without our intervention.

Reading Page

when the CPU requests a worker thread needs a page, it will call BufferPool :: GetPage module, GetPage function BUF structure will be scanned, if the page requested discovery will add to the page latch and returned to the caller ; if not found, you need to read the page from disk.

Page read, there will be a variety of behaviors, such as pre-read (read ahead) mechanism, but the basic steps are as follows:
Step 1: sent to the memory manager (memory manager) a fixed size page allocation request;
Step 2: The page It will be associated with a track structure of the page of BUF;
step 3: Add the EX latch on the page is modified to prevent;
step 4: BUF configuration into the memory of a HASH table. So that all use the same request PAGE BUF and will be protected in the EX latch. If related objects already in the HASH table, you do not need this step, but directly to the HASH table access to relevant content;
Step 5: establish I / O request, and sends the I / O request (asynchronous I / O)
Step 6: attempt to acquire latch type that has been requested;
Step 7: Check the error condition, an error is thrown if there is an error .

If an error occurs, it will lead to other activities, such as if checksum verification fails, it will have behavioral re-read (reread) of. As can be seen from the above steps, when read Page complete, and will not release relevant EX latch immediately, but will wait until the check is completed page release.

Writing Page

write page and read page is very similar. When writing for the page is a page that already exists in memory, and BUF status is marked as dirty (changed), look dirty pages can be viewed by sys.dm_os_buffer_descriptors. When writing page, SQL Server is calling WriteMultiple done. When writing page, related to the three thread, namely lazywriter, checkpoint, eager write.

Lazywriter is a regular scans to check the buffer pool free list size of thread. After SQL Server 2008, it introduced the TLA (TIME LAST ACCESS) algorithm, which is an improvement of the LRU. Lazywriter to mark judged according to the algorithm for the page dirty pages, if outdated (aged), is called WriteMultiple related dirty page is written to disk.

Checkpoint is used to identify whether all the associated firms submitted the changed page has been written to disk. Checkpoint is the starting point of recovery. Lazywriter the difference is, the checkpoint would not be removed from the cache dirty page, but rather marked as clean (clean). There are many conditions will trigger the checkpoint, in the checkpoint occurs, it will call WriteMultiple complete the relevant written.

Eager write operations in a number of BCP, blob fields, some pages must be written from memory to disk to complete the related transactions, such writing is eager write, the same is done by calling WriteMultiple.

When writing the requested page, not only will request written dirty pages, also writes contiguous pages, reducing I / O requests, improve I / O performance. Written pages, we also need the support of latch generally request EX, to prevent possible future modification of the page. However, SQL Server also allows for reading of relevant content using the SH latch in the process of writing the page.

PAE and AWE

Needless to say this, but there are two things to note: First, PAE and AWE is an independent, open AWE, does not require PAE; open PAE, AWE is not required; the second is AWE only expand the size of the buffer pool, of plan cache, which do not extend it.

Read Ahead

If we open the set statistics io on, often to see how much content pre-read. SQL Server read-ahead mechanism can greatly improve the asynchronous I / O capabilities.

Sparse Files and Copy On Write (COW ) Pages

Sparse files are mainly used for online DBCC and snapshot database. The actual space sparse file generally much smaller than the file size value. When you create a snapshot database, accompanied by acts of copy on write of, copy on write means that when a page is to be written content, a check occurs to determine whether the page was copy to the snapshot database, and if not, before the page is changed will be written to the snapshot database to ensure that the contents have been for a snapshot of the moment. In order to maintain the snapshot data in the parent library will file control block chaining (FBCs) to manage the correspondence between the snapshot and parent, so you can quickly locate the copy on write.

Snapshot though small at first, but with the change parent bank, will gradually become larger, so when you create a database snapshot, you need to take this into account. Also, because the data file may require frequent interaction, it also needs to be a good snapshot on the I / O performance of the device.

When the query in the database snapshot read request will first take place in the snapshot database, if the relevant page has not come from the parent library copy, then the read request to FBC on the parent, and the parent library from read the relevant page. This will ensure a large degree of sparse snapshot.

DBCC will also use snapshots to complete the relevant content, of course, this is a snapshot of the internal maintenance. This concept is also clarified a misunderstanding "DBCC CHECKDB will lock the database of the page." In fact, after 2005, DBCC it by maintaining an internal consistency check to complete snapshot of the database. Of course, this requires an I / O is relatively high, and the need for more space, if the conditions are not satisfied, you can use WITH TABLE LOCK consistency check on the database file directly.

Scribbler (s)
Scribbler meaning child scribble color outside the picture frame, showing a component part of its area not change the data in memory. This can cause data corruption. In SQL Server 2000, in order to prevent such acts, introduced Torn page checksum mechanism; and after sql server 2005, also introduced checksum mechanism.

If page_audit set to checksum, lazywriter checks memory pages, and recalculates the checksum value on the page, if the values do not match, it will record the error and the elimination of the page directly from memory, suggesting that there was a "scribbler . " Track "scribbler" page is more difficult, but there are trace flag -T831 can be turned on in order to gain more details.

It is important to check the page of content on a SQL Server IO, more content can refer to SQL Server IO basic white paper.

We can see that SQL Server provides a wealth of internal I / O management. Understanding these concepts will also be able to better understand the working mechanism of SQL Server will be able to rise to the occasion when run into some internal error or I / O subsystem settings. For more details, please refer to SQL Server IO basic white paper.

Reproduced in: https: //www.cnblogs.com/kevinGao/archive/2012/05/31/2555421.html

Guess you like

Origin blog.csdn.net/weixin_34235135/article/details/93342847