mongodb调优那些事（二）-索引

第一次使用mongo开发系统，最近遇到了性能问题，发现索引建的不好，该集合是用来存储采集的信息，字段包括发布日期publishDate(int型)、分值blendedScore(int型)、发布时间publishTime(date型)、相似数量similarNum(int型)、是否垃圾isRubbish(int型)以及信息自身的一些字段（不涉及查询及排序在此不再罗列），数量大概有千万级。
业务描述：
1）可按发布日期倒序+得分倒序+发布时间倒序。
2）可按相似数量倒序+发布时间倒序。
3）可按发布时间倒序。
以上三种情况都可按照发布日期、是否垃圾、关键词等进行搜索。

简化为mongo查询语句如下：

1）db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishDate:-1,blendedScore:-1,publishTime:-1}
2）db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{similarNum:-1,publishTime:-1}
3）db.infos.find{publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}.sort{publishTime:-1}

为了得到最优的结果进行了以下尝试

第一次创建的索引：

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,similarNum:-1,publishTime:-1},{name:"SortIdx",background:true});

考虑到节省空间，第一次按上面的方式试图将所有排序字段都放到一个索引中。这种方式的结果是：
1）业务1走索引速度最快。
2）业务2走索引，速度可接受。
3）业务3走索引，但速度慢到无法接受。
这种结果显然无法接受，所以继续修改索引。

第二次创建的索引：

db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true});  //索引2
db.infos.ensureIndex({publishTime:-1},{name:"TimeSortIdx",background:true});  //索引3

由于每个业务的排序方式差别较大，而mongodb每次查询只会使用一个索引所以按照上面的方式创建了三个索引。这样创建后每个业务查询速度都很快。由于isRubbish是各个查询基本都会带的条件，所以考虑将isRubbish加入索引中，但是isRubbish字段只有0、1的取值，可能用作索引的意义不大，所以针对业务1的查询测试了一下。测试方法就是在该集合上再创建一个增加了isRubbish的字段的索引，创建语句：

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx1",background:true})

使用explain来查看执行情况，分别强制使用ScoreSortIdx和ScoreSortIdx1索引，mongo语句如下：

db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx").explain("executionStats");

db.infos.find({publishDate:{$gte:20160310,$lte:20160318},isRubbish:0}).sort({publishDate:-1,blendedScore:-1,publishTime:-1}).skip(10000).limit(1).hint("ScoreSortIdx1").explain("executionStats")

ScoreSortIdx explain的执行结果：

{
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 1,
    "executionTimeMillis": 62,
    "totalKeysExamined": 10438,
    "totalDocsExamined": 10438,
    "executionStages": {
      "stage": "LIMIT",
      "nReturned": 1,
      "executionTimeMillisEstimate": 50,
      "works": 10439,
      "advanced": 1,
      "needTime": 10437,
      "needFetch": 0,
      "saveState": 81,
      "restoreState": 81,
      "isEOF": 1,
      "invalidates": 0,
      "limitAmount": 0,
      "inputStage": {
        "stage": "SKIP",
        "nReturned": 1,
        "executionTimeMillisEstimate": 50,
        "works": 10438,
        "advanced": 1,
        "needTime": 10437,
        "needFetch": 0,
        "saveState": 81,
        "restoreState": 81,
        "isEOF": 0,
        "invalidates": 0,
        "skipAmount": 0,
        "inputStage": {
          "stage": "FETCH",
          "filter": { "isRubbish": { "$eq": 0 } },
          "nReturned": 10001,
          "executionTimeMillisEstimate": 50,
          "works": 10438,
          "advanced": 10001,
          "needTime": 437,
          "needFetch": 0,
          "saveState": 81,
          "restoreState": 81,
          "isEOF": 0,
          "invalidates": 0,
          "docsExamined": 10438,
          "alreadyHasObj": 0,
          "inputStage": { "stage": "IXSCAN", "nReturned": 10438, "executionTimeMillisEstimate": 30, "works": 10438, "advanced": 10438, "needTime": 0, "needFetch": 0, "saveState": 81, "restoreState": 81, "isEOF": 0, "invalidates": 0, "keyPattern": { "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx", "isMultiKey": false, "direction": "forward", "indexBounds": { "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10438, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } }
      }
    }
  }
}

ScoreSortIdx1 explain的执行结果：

{
  "executionStats": {
    "executionSuccess": true,
    "nReturned": 1,
    "executionTimeMillis": 14,
    "totalKeysExamined": 10001,
    "totalDocsExamined": 10001,
    "executionStages": {
      "stage": "LIMIT",
      "nReturned": 1,
      "executionTimeMillisEstimate": 10,
      "works": 10002,
      "advanced": 1,
      "needTime": 10000,
      "needFetch": 0,
      "saveState": 78,
      "restoreState": 78,
      "isEOF": 1,
      "invalidates": 0,
      "limitAmount": 0,
      "inputStage": {
        "stage": "SKIP",
        "nReturned": 1,
        "executionTimeMillisEstimate": 10,
        "works": 10001,
        "advanced": 1,
        "needTime": 10000,
        "needFetch": 0,
        "saveState": 78,
        "restoreState": 78,
        "isEOF": 0,
        "invalidates": 0,
        "skipAmount": 0,
        "inputStage": {
          "stage": "FETCH",
          "nReturned": 10001,
          "executionTimeMillisEstimate": 10,
          "works": 10001,
          "advanced": 10001,
          "needTime": 0,
          "needFetch": 0,
          "saveState": 78,
          "restoreState": 78,
          "isEOF": 0,
          "invalidates": 0,
          "docsExamined": 10001,
          "alreadyHasObj": 0,
          "inputStage": { "stage": "IXSCAN", "nReturned": 10001, "executionTimeMillisEstimate": 10, "works": 10001, "advanced": 10001, "needTime": 0, "needFetch": 0, "saveState": 78, "restoreState": 78, "isEOF": 0, "invalidates": 0, "keyPattern": { "isRubbish": 1, "publishDate": -1, "blendedScore": -1, "publishTime": -1 }, "indexName": "ScoreSortIdx1", "isMultiKey": false, "direction": "forward", "indexBounds": { "isRubbish": [ "[0.0, 0.0]" ], "publishDate": [ "[20160318.0, 20160310.0]" ], "blendedScore": [ "[MaxKey, MinKey]" ], "publishTime": [ "[MaxKey, MinKey]" ] }, "keysExamined": 10001, "dupsTested": 0, "dupsDropped": 0, "seenInvalidated": 0, "matchTested": 0 } }
      }
    }
  }
}

可以看到使用ScoreSortIdx1的查询时间比ScoreSortIdx不只快了一倍，当使用上面的条件进行count时，这种差距会更明显。

所以最终创建的索引：

db.infos.ensureIndex({isRubbish:1,publishDate:-1,blendedScore:-1,publishTime:-1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({isRubbish:1,similarNum:-1,publishTime:-1},{name:"HotSortIdx",background:true});  //索引2
db.infos.ensureIndex({isRubbish:1,publishTime:-1},{name:"TimeSortIdx",background:true});  //索引3

发现mongodb的多列索引的第一个字段必须作为排序或查询字段，否则会执行全表扫描。所以在应用中所有的查询都保证必须使用isRubbish作为查询条件，即使不对isRubbish进行过滤，也会将查询条件加上{isRubbish:{$gte:0}}以保证查询都走索引。另外尽量丰富其他查询条件，减少命中的结果集数量。

后面读了一篇大神的文章：http://www.mongoing.com/eshu_explain3 mongo执行计划的详细讲解，得到一个建多列索引的原则：{精确匹配字段,排序字段,范围查询字段} 这样建起来的索引排序会更为高效（已做实际验证，的确是这样）。
由于该业务中存在{isRubbish:{$gte:0}}即范围查询的情况，所以最后将索引改成了下面的方式：

db.infos.ensureIndex({publishDate:-1,blendedScore:-1,publishTime:-1,isRubbish:1},{name:"ScoreSortIdx",background:true}); //索引1
db.infos.ensureIndex({similarNum:-1,publishTime:-1,isRubbish:1},{name:"HotSortIdx",background:true});  //索引2
db.infos.ensureIndex({publishTime:-1,isRubbish:1},{name:"TimeSortIdx",background:true});  //索引3

mongodb调优那些事（二）-索引

猜你喜欢