Beware of low efficiency index seeks

    A recent analysis service work order line has been less stable, issue the database operation timed alarm monitoring platform time to time.
    After the operation and maintenance of communication brothers, found that business operations failed several times will appear at 1:00 every day, and on database monitoring and found no obvious abnormalities.
    We found some database operations in the log analysis services produced in the SocketTimeoutException.
    Students begin to develop a wish to circumvent this problem by adjusting the MongoDB Java Driver timeout parameters.
    However, after detailed analysis, this is not cure the problem, and how to adjust timeout configuration is also difficult to assess.
    The following is an analysis of the process with regard to this issue, tuning.

A preliminary analysis

    From the point of view of the error information, the operation of the database response is a timeout, then the client is configured SocketReadTimeout 60s.
So, what action will result in the database 60s have not been able to return it?
Here Insert Picture Description

Second, the business operations

    On the left is a work order database data table (t_work_order), which records the information about each work order, including the work order number (oid), last modified time (lastModifiedTime)
    Analysis Services is an application implemented in Java, in the morning every day removed one day prior to 1:00 will pull the modified work order information (order number required by the work sorting) processing.
    Since the work order table is very large (ten million), it will use when handling tab practice (each taking 1000), by the work order number using page by:

1, the first pull

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")},
   "oid": {$exists: true}})
   .sort({"oid":1}).limit(1000)

2, a second pulling, the work order number to the first record of the last pulled as a starting point

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")},
   "oid": {$exists: true, $gt: "VXZ190"}})
   .sort({"oid":1}).limit(1000)

3, such a query based on the index, the data table developers used are as follows:

db.t_work_order.ensureIndexes({
   "oid" : 1,
   "lastModifiedTime" : -1
})

    Although this index is basically matching the query field, but in the actual implementation Shique exhibit very low efficiency:
    for the first time pulling a very long time, over 60s often lead to error, and pull back some of the time will be faster .
    For accurate simulation of the scenario, we have a small part of the data preset in a test environment, for pulling the recording SQL execution Explain:

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")}
   "oid": {$exists: true}})
   .sort({"oid":1}).limit(1000)
   .explain("executionStats")

Output:

"nReturned" : 1000,
"executionTimeMillis" : 589,
"totalKeysExamined" : 136661,
"totalDocsExamined" : 1000,

...

"indexBounds" : {
    "oid" : [
        "[MinKey, MaxKey]"
    ],
    "lastModifiedTime" : [
        "(new Date(1554806697106), new Date(1554803097106))"
    ]
},
"keysExamined" : 136661,
"seeks" : 135662,

Found in the implementation process, retrieve 1000 records, but still need to scan items index 13.6 W Article!
Among them, almost all of the costs are spent in one seeks to operate.

Three reasons, the index seeks

The official document seeks to explain the following:
The number of times that we had to seek the index cursor to a new position in order to complete the index scan.

Translation:
seeks 是指为了完成索引扫描(stage),执行器必须将游标定位到新位置的次数。

     MongoDB index is achieved (3.x above) B + tree, for successive leaf node is very fast scanning (addressing only once), then too much then the overall operation seeks in scanning a large number of addressing (non-skipping target node). Moreover, this indicator seeks support in version 3.4, it can be speculated that the presence of the operating performance is affected.
    Seeks to explore how the generated query statement attempts made some changes:

1, remove the exists condition
exists conditions exist because of historical problems (some old record does not contain the field work order number), in order to check whether the query exists as a critical issue, amend as follows:

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")}
   })
   .sort({"oid":1}).limit(1000)
   .explain("executionStats")

Implementation of the results are:

"nReturned" : 1000,
"executionTimeMillis" : 1533,
"totalKeysExamined" : 272322,
"totalDocsExamined" : 272322,

...

"inputStage" : {
  "stage" : "FETCH",
  "filter" : {
      "$and" : [
          {
              "lastModifiedTime" : {
                  "$lt" : ISODate("2019-04-09T10:44:57.106Z")
              }
          },
          {
              "lastModifiedTime" : {
                  "$gt" : ISODate("2019-04-09T09:44:57.106Z")
              }
          }
      ]
},

...

"indexBounds" : {
    "oid" : [
        "[MinKey, MaxKey]"
    ],
    "lastModifiedTime" : [
        "[MaxKey, MinKey]"
    ]
},
"keysExamined" : 272322,
"seeks" : 1,

    Here we found that after removing exists, seeks to become the primary, but the entire inquiry scan a bar 27.2W index entry! Exactly twice before removed.

    seeks changes to once explained that it had used the way a leaf node sequential scan, however, since the scanning range is very large, in order to find the target record, performs a sequential scan and filter a large number of non-qualifying record.

    Emerged filter can illustrate this point in the FETCH stage. At the same time, we examined the characteristics of the data table: with a work order number is the presence of two records! Then be explained: 在存在exists查询条件时,执行器会选择按工单号进行seeks跳跃式检索as shown below:
Here Insert Picture Description

在不存在exists条件的情况下,执行器选择了叶节点顺序扫描的方式,As shown below:
Here Insert Picture Description

2, gt reverse order and conditions
in addition to the first query, the subsequent paging query we also analyzed as follows:

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")},
   "oid": {$exists: true, $gt: "VXZ190"}})
   .sort({"oid":1}).limit(1000)
   .explain("executionStats")

The above statement, mainly to increase $gt: “VXZ190″this condition, the implementation process is as follows:

"nReturned" : 1000,
"executionTimeMillis" : 6,
"totalKeysExamined" : 1004,
"totalDocsExamined" : 1000,

...

"indexBounds" : {
    "oid" : [
        "(\"VXZ190\", {})"
    ],
    "lastModifiedTime" : [
        "(new Date(1554806697106), new Date(1554803097106))"
    ]
},
"keysExamined" : 1004,
"seeks" : 5,

    Can be found, the number of seeks is very small, and the retrieval process to scan only the 1004 record, the efficiency is very high.

    So, does this mean in the back of the data, the records that satisfy the query conditions very dense it?

    To verify this, we will start the first page of a query to do some adjustments, replaced by work order number descending (scanning forward from the back):

db.t_work_order.find({
   "lastModifiedTime":{
      $gt: new Date("2019-04-09T09:44:57.106Z"),
      $lt: new Date("2019-04-09T10:44:57.106Z")},
   "oid": {$exists: true}})
   .sort({"oid":-1}).limit(1000)
   .explain("executionStats")

The new "reverse order query," the implementation process is as follows:

"nReturned" : 1000,
"executionTimeMillis" : 6,
"totalKeysExamined" : 1001,
"totalDocsExamined" : 1000,

...

"direction" : "backward",
"indexBounds" : {
    "oid" : [
        "[MaxKey, MinKey]"
    ],
    "lastModifiedTime" : [
        "(new Date(1554803097106), new Date(1554806697106))"
    ]
},
"keysExamined" : 1001,
"seeks" : 2,

    Can be seen, the higher the efficiency of the implementation, operation seeks what almost do not need!
    After some confirmation, we learn in the distribution of all data, the greater the number of records that update work orders greater value, basically we want to target data queries are concentrated at the end .

    So there will be mentioned at the beginning, the first query is very slow or even time out, and the back of the query fast.

Query execution two directions mentioned above as shown:

Add $ ① gt condition retrieved from the middle
② reverse order, beginning from the retrieved later

Here Insert Picture Description

Fourth, optimization ideas

Through analysis, we know that the crux of the problem is that the index scan range is too large, then how to optimize, in order to avoid scanning a large number of records it?

From the existing index and conditions, due to the presence gt Meanwhile, time exists, and the leaf nodes is defined, the operation will inevitably produce seeks, and query performance is unstable, with the distribution of data, the specific query has a great relationship .

So only the increase in the threshold socketTimeout may only be a temporary solution mentioned at the outset, once the index value of the distribution of the data or change the amount of data continues to increase, more serious things can happen.

Back to the beginning of the demand scenario, the timer required to read daily updates work order (sorted by work order number), and then batch process .

Then, according to the idea of ​​parts into a whole, a new lastModifiedDay field, the stored value is lastModifiedTime corresponding date (lower rounding), so that on the same day update work order record has the same value.

The establishment of a composite index {lastModifiedDay:1, oid:1}, the corresponding query read:

{
  "lastModifiedDay": new Date("2019-04-09 00:00:00.000"),
  "oid": {$gt: "VXZ190"}
} 
-- limit 1000

Execution results are as follows:

"nReturned" : 1000,
"executionTimeMillis" : 6,
"totalKeysExamined" : 1000,
"totalDocsExamined" : 1000,

...

"indexBounds" : {
    "lastModifiedDay" : [
        "(new Date(1554803000000), new Date(1554803000000))"
    ],
    "oid" : [
        "(\"VXZ190\", {})"
    ]
},
"keysExamined" : 1000,
"seeks" : 1,

After this optimization, every time, at most Scan 1000 records, query speed is very fast!

Guess you like

Origin blog.csdn.net/m0_37886429/article/details/101025291