[SQL Server]-BUG造成错误3624

SQL Server管理人员最怕看到错误823, 824, 1788x, 3624.......等重大错误,
一旦发生了总是要花相当多的心力来处理, 甚至有数据丢失的可能性.
不过今天踩了个雷, 发生了不是3624的错误3624......


Colin Colin 2 53 2017-11-22T17:01:00Z 2017-11-22T17:01:00Z 5 855 4876 40 11 5720 15.00 Clean Clean false 0 2 false false false EN-US ZH-TW X-NONE

PS. 本文的问题已在SQL Server 2016修正.

https://connect.microsoft.com/SQLServer/feedback/details/2936151/assertion-failure-when-using-option-recompile-with-an-invalid-offset-clause

相信在经常性管理SQL Server的人员, 对于SQL Server几个重大的错误代号是不会陌生的, 例如823, 824, 605, 1788x, 3624等, 这些都表示著, 要嘛SQL Server Instance有问题, 要嘛数据库的资源 (Memory、Physical Disk) 发生异常, 或是数据库保存对象发生错误. 多数的情况下, 这些错误是必须透过DBCC CHECKDB、SQL DUMP来进一步分析问题, 并且多数的情况是得透过restore backup来抢救数据的.

扫描二维码关注公众号,回复: 7237852 查看本文章

今天踩到一个很大的雷, 情境是这样的……

正准备迎接下班的同时, 消息框跳出一个来自应用程序的错误截图, 主要是说明应用程序在执行某支Procedure时, 会不固定的发生SQL exception ERROR 3624. 看到这个错误, 心中当下毛毛的, 这…….不会是数据库异常吧!

当大家在Google这个错误时, 不过因为过去遇过数次这个错误, 索性直接找上了DBA, 请他们进行DBCC CHECKDB的检查 – 对的, 多数文章都是这么说, 先检查DBCC CHECKDB. 然而, DBCC CHECKDB是没有异常的…… 很奇妙的说.

之后再一个一个数据表做DBCC CHECKTABLE, 竟然也是都正常的...... 好在错误3624是会产生dump的, 相信这个错误是与DB本质无关 (虽然心中还是毛毛的). 回头去看dump, 发生点是在procedure执行阶段, 就检查了程序, 左看右看都不像有问题的.

直到…… 一个一个参数去带入后, 找到原因是下列组合引发的错误

* 进程中有使用option (recompile)

* 同时使用offset … fetch next … rows

* 并且offset带入的值是负值.

直接来看reproduce这个问题吧

use DEMO;

go

--建立测试数据表

create table tbl_test

(c1 int, c2 varchar(10));

go

--写入测试数据

insert into tbl_test values (1, 'a'),(2,'b'),(3,'c');

go

--一个常见的案例, 分页显示

--建立procedure

create procedure page_list

@page int,

@size int

as

begin

    select * from tbl_test

    order by c1 asc

    offset (@page - 1) * @size rows fetch next @size rows only;

end

go

--测试正常回传

exec page_list 1,2;

/*

c1  c2

1   a

2   b

*/

exec page_list 2,2;

/*

c1  c2

3   c

*/

--测试负值在OFFSET上时

--错误是10724

exec page_list 0,2;

/*

消息 10742,层级 15,状态 1,进程 page_list,行 7 [批次开始行 34]

The offset specified in a OFFSET clause may not be negative.

*/

至此的错误是一个可预期的, 在OFFSET指定负值会产生10724的错误. 不过加上option (recompile) 后就不一样了.

--对procedure加入option (recompile)

alter procedure page_list

@page int,

@size int

as

begin

    select * from tbl_test

    order by c1 asc

    offset (@page - 1) * @size rows fetch next @size rows only

    option (recompile);

end

go

--再次带入负值

--此时会停顿一下 => 打dump

--然后报错

exec page_list 0,2;

/*

Location:   op_ppqte.cpp:12267

Expression: llSkip >= 0

SPID:       51

Process ID: 652

消息 3624,层级 20,状态 1,进程 page_list,行 6 [批次开始行 52]

A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a Hotfix from Technical Support.

消息 596,层级 21,状态 1,行 52

Cannot continue the execution because the session is in the kill state.

消息 0,层级 20,状态 0,行 52

在目前的命令上发生严重错误。如果有任何结果,都必须舍弃。

*/

百思不得其解, 为什么OFFSET配上option (recompile) 后会引发corruption, 而且还打dump出来.

(部分dump)

Memory                              

MemoryLoad = 30%                    

Total Physical = 2047 MB            

Available Physical = 1413 MB        

Total Page File = 2431 MB           

Available Page File = 1714 MB       

Total Virtual = 134217727 MB        

Available Virtual = 134212073 MB    

Dump thread - spid = 0, EC = 0x00000000F7F3AC60                                                               

*Stack Dump being sent to C:Program FilesMicrosoft SQL ServerMSSQL12.MSSQLSERVERMSSQLLOGSQLDump0001.txt 

* *                               

*                                                                                                                

* BEGIN STACK DUMP:                                                                                             

*   11/23/17 00:04:28 spid 51                                                                                   

*                                                                                                               

* Location: op_ppqte.cpp:12267                                                                                 

* Expression:   llSkip >= 0                                                                                      

* SPID:     51                                                                                                     

* Process ID:   652                                                                                              

*                                                                                                                

* Input Buffer 62 bytes -                                                                                       

*             exec page_list -4,2; 

然后从各方资讯, 就是要做DBCC CHECKDB的检查, 可是测试数据库才建好, 里头也就这么一张表……. 怎么检查都没有错啊…….

--从ERRORLOG可以看到dump部分内容

--与错误发生的建议处理方式

sp_readerrorlog

/*

Error: 17066, Severity: 16, State: 1.

SQL Server Assertion: File: , line=12267 Failed Assertion = 'llSkip >= 0'. This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted.

Error: 3624, Severity: 20, State: 1.

A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a Hotfix from Technical Support. 

*/

本想发个BUG或是Design issue, 这个行为不像ERROR 3624, 最终取得BUG FIX的文档, 也确认相同的进程在SQL Server 2016已经被修复.


原文:大专栏  [SQL Server]-BUG造成错误3624


猜你喜欢

转载自www.cnblogs.com/petewell/p/11490090.html
今日推荐