DB2自动表空间因磁盘满无法扩容导致的insert应用的bpdLatch等待

问题现象:


db2 10.5.10/AIX
db2 purescache的insert语句HANG住,查看应该状态是 UOW executing

同时观察到有大量的SQLO_LT_SQLB_BPD__bpdLatch_SX等待,且持续等待。

Latch Waiters With No Holders:
Address            Holder     Waiter     Filename             LOC        LatchType           
0x0A00020120B37E20 0          220105     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506       SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120B80688 0          221133     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506       SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          221647     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 6506       SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          553355     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          537397     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          554325     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          539763     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          411338     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          480716     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          507690     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          404558     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120B80688 0          401092     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          549067     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AD1240 0          539120     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120ADBC00 0          416501     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          481017     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AD1240 0          550402     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          531209     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX
0x0A00020120AF6368 0          423184     /view/db2_v105fp10_aix64_s180615_special_38579/vbs/engn/include/sqlbTPL.h 246        SQLO_LT_SQLB_BPD__bpdLatch_SX

问题分析:


收集了应用的Stack,如下
<StackTrace>
-------Frame------ ------Function + Offset------
thread_wait
getConflictComplex
getConflict
sqlbfix
sqldCreateNewAppendCache
sqldExtendAppendCache
sqldSearchPageCache
Stack traceback unavailable.
</StackTrace>

查看诊断日志,发现是因为磁盘满了,自动表空间无法扩容:

2020-03-03-16.53.05.928941+480 I23217218A5756       LEVEL: Error (OS)
PID     : 12189880             TID : 549067         PROC : db2sysc 0
INSTANCE: db2inst1             NODE : 000           DB   : CSPDB
APPHDL  : 0-35797              APPID: 197.0.1.101.40182.200303082448
AUTHID  : APPUSER              HOSTNAME: NODE01
EDUID   : 549067               EDUNAME: db2agent (CSPDB) 0
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED  : OS, -, gpfs_prealloc
OSERR   : ENOSPC (28) "No space left on device"
DATA #1 : String, 132 bytes
A total of 4 analysis will be performed :
 - User info
 - ulimit info
 - Target file info
 - File system

 Target file handle = 680
DATA #2 : String, 184 bytes
  Real user ID of current process       = 550
  Effective user ID of current process  = 550
  Real group ID of current process      = 550
  Effective group ID of current process = 550
DATA #3 : String, 381 bytes
Current process limits (unit in bytes except for nofiles) :
  mem     (S/H) = unlimited / unlimited
  core    (S/H) = unlimited / unlimited
  cpu     (S/H) = unlimited / unlimited
  data    (S/H) = unlimited / unlimited
  fsize   (S/H) = unlimited / unlimited
  nofiles (S/H) = unlimited / unlimited
  stack   (S/H) = 4294967296 / 4294967296
  rss     (S/H) = unlimited / unlimited
DATA #4 : String, 41 bytes
current sbrk(0) value: 0x0000000177eb8200
DATA #5 : String, 261 bytes
Target File Information :
  Size               = 62914560
  Link               = No
  Reference path     = N/A
  Type               = 0x8000
  Permissions        = rw-------
  UID                = 550
  GID                = 550
  Last modified time = 1583225585
DATA #6 : String, 385 bytes
File System Information of the target file :
  Block size        = 32768 bytes
  Total size        = 3457784217600 bytes
  Free size         = 139460608 bytes
  Total # of inodes = 835584
  FS name           = N/A
  Mount point       = /db2data
  FSID (major,minor)= 2, 151
  FS type name      = mmfs
  DIO/CIO mount opt = None
  Device type       = N/A
  FS type           = 0x4
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x09000000177BD758 pdOSSeLoggingCallback + 0x5A0
  [1] 0x090000001D43457C oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
  [2] 0x090000001D4349F0 ossLogSysRC + 0x70
  [3] 0x090000001D45D6BC ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
  [4] 0x090000001993E164 sqloSystemErrorHandler + 0x518
  [5] 0x0900000018AC2924 sqloSetFileSize + 0x26D0
  [6] 0x0900000018AC1CE8 sqloSetFileSize + 0x1A94
  [7] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
  [8] 0x0900000018AA8ED8 sqloSetFileSize + 0x3BC
  [9] 0x09000000132DDA1C sqlbResizeDMSContainer__FP12SQLB_POOL_CBiUiT3P12SQLB_GLOBALS + 0x4E0
  [10] 0x09000000132DE348 sqlbARExtendContainers__FlP12SQLB_POOL_CBP16SQLB_POOLCONT_CBbPlT5P16SQLB_POOL_MAP_CBP12SQLB_GLOBALS + 0x234
  [11] 0x09000000190FE2CC sqlbGrowPool__FP12SQLB_POOL_CBP12SQLB_GLOBALS + 0x96C
  [12] 0x0000000000000000 ?unknown + 0x0
  [13] 0x0900000019122CA0 sqlbGrowPool__FP12SQLB_POOL_CBP12SQLB_GLOBALS + 0x5F54
  [14] 0x09000000190AEF64 sqlbSearchSomeSMPs__FUiT1bPUiPbT4P12SQLB_POOL_CBP16SQLB_OBJECT_DESCT5UsP11SQLB_FIX_CBP12SQLB_GLOBALS + 0x350
  [15] 0x09000000190CFF20 sqlbAllocateExtent__FP12SQLB_GLOBALSP12SQLB_POOL_CBP16SQLB_OBJECT_DESCPUiUs + 0x1354
  [16] 0x09000000190D4E14 sqlbAllocateExtent__FP12SQLB_GLOBALSP12SQLB_POOL_CBP16SQLB_OBJECT_DESCPUiUs + 0x6248
  [17] 0x090000001785AB74 sqlbDMScheckObjAlloc__FP16SQLB_OBJECT_DESCUliT3PPUiP12SQLB_GLOBALS + 0x2C8
  [18] 0x0900000016744F18 sqlbExtendObject__FP12SQLB_GLOBALSP16SQLB_OBJECT_DESCUiN23PFP8sqeAgentP9SQLB_PAGEPv_vPvT3PUiP11SQLB_FIX_CB + 0x490
  [19] 0x0900000016757FF4 @114@sqldCreateAppendCachePagesForBaseTable__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEUlPUlP11SQLB_FIX_CBlPUiP16SQLD_TABLE_CACHEP16dm + 0x180
  [20] 0x090000001676280C sqldCreateNewAppendCache__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEUlT3PUlP16SQLD_TABLE_CACHET6 + 0x77C
  [21] 0x0900000016760D64 sqldExtendAppendCache__FP13SQLD_DFM_WORKUlP16SQLD_TABLE_CACHET3 + 0xD34
  [22] 0x0900000017C680C4 sqldSearchPageCache__FP13SQLD_DFM_WORKP15SQLD_PAGE_CACHEP16SQLD_TABLE_CACHET3 + 0x2F70
  [23] 0x0900000017C4C564 sqldInsertRow__FP13SQLD_DFM_WORKi + 0x190
  [24] 0x0900000017C4C564 sqldInsertRow__FP13SQLD_DFM_WORKi + 0x190
  [25] 0x0900000017C4BE5C sqldRowInsert__FP8sqeAgentUsT2UcUliPP10SQLD_VALUEP8SQLZ_RIDPPv + 0x658
  [26] 0x0900000017C4B330 sqlrinsr__FP8sqlrr_cbUsT2iT2PP10SQLD_VALUEQ3_10sqlri_iudo11t_iudoFlags17t_iudoFlagsKernelP8SQLZ_RIDPPv + 0xC8
  [27] 0x0900000017C4B5DC sqlriisr__FP8sqlrr_cb + 0x1D0
  [28] 0x0900000017B4319C sqlriSectInvoke__FP8sqlrr_cbP12sqlri_opparm + 0x94
  [29] 0x0900000017E7380C sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x1ECC
  [30] 0x0900000017E72C7C sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0x133C
  [31] 0x0900000017E726C0 sqljsParseRdbAccessed__FP13sqljsDrdaAsCbP13sqljDDMObjectP14db2UCinterface + 0xD80
  [32] 0x0900000017EC3A1C @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x9B4
  [33] 0x0900000017EC3A1C @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x9B4
  [34] 0x0900000017E791B8 @72@sqljsSqlam__FP14db2UCinterfaceP8sqeAgentb + 0x1A5C
  [35] 0x0900000018FA87E8 @72@sqljsDriveRequests__FP8sqeAgentP14db2UCconHandle + 0xA8
  [36] 0x0900000018FA92FC @72@sqljsDrdaAsInnerDriver__FP18SQLCC_INITSTRUCT_Tb + 0x600
  [37] 0x0900000018705C34 RunEDU__8sqeAgentFv + 0x4150C
  [38] 0x090000001871B02C RunEDU__8sqeAgentFv + 0x124
  [39] 0x090000001715B660 EDUDriver__9sqzEDUObjFv + 0x138
  [40] 0x09000000178EDA08 sqloEDUEntry + 0x394
  [41] 0x090000000053FE10 _pthread_body + 0xF0
  [42] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF

问题解决:


清理了无用的磁盘数据,问题解决。

总结:

按照之前的常识,如果表空间无法扩容,insert语句一般会报错,而不是无何止的HANG住,所以HANG的问题除了db2pd之外,还要看db2diag.log,事实上,db2diag.log是解决DB2问题中最重要的工具之一。

猜你喜欢

转载自blog.csdn.net/qingsong3333/article/details/104639115