1.普通索引

索引是加速数据库查询的重要工具，只有在数据量比较大的时候才有意义，所以向数据库中批量插入20000条数据：

> for(i=0;i<20000;i++){
... db.numbers.save({num:i});
... }
WriteResult({ "nInserted" : 1 })
> db.numbers.count()
20000

执行后需要稍微等待几秒钟

做一次查询并explain（显示查询计划），结果类似下面：

> db.numbers.find({num:{"$gt":19995}}).explain("executionStats")
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "test.numbers",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "num" : {
                                "$gt" : 19995
                        }
                },
                "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "num" : {
                                        "$gt" : 19995
                                }
                        },
                        "direction" : "forward"
                },
                "rejectedPlans" : [ ]
        },
        "executionStats" : {
                "executionSuccess" : true,
                "nReturned" : 4,
                "executionTimeMillis" : 10,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 20000,
                "executionStages" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                                "num" : {
                                        "$gt" : 19995
                                }
                        },
                        "nReturned" : 4,
                        "executionTimeMillisEstimate" : 11,
                        "works" : 20002,
                        "advanced" : 4,
                        "needTime" : 19997,
                        "needYield" : 0,
                        "saveState" : 156,
                        "restoreState" : 156,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 20000
                }
        },
        "serverInfo" : {
                //这里是一些本机信息
        },
        "ok" : 1
}

explain函数可以接受的参数有：'queryPlanner'（等于无参）, 'executionStats', 'allPlansExecution'

这里可以看到，本次查询需要遍历全部20000个文档（docsExamined），耗时大约10ms，在真实环境中，NoSQL数据库存储的数据量级远远大于2W，假如200W数据，那就要1s，显然不可接受

1）索引建立、删除

可以使用createIndex函数，在这上面建立索引：

> db.numbers.createIndex({num:1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
> db.numbers.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.numbers"
        },
        {
                "v" : 2,
                "key" : {
                        "num" : 1
                },
                "name" : "num_1",
                "ns" : "test.numbers"
        }
]

num:1代表在num域建立索引，使用getIndexes()查询可得，目前有两个索引，一个关联在_id，一个关联在num，如果要删除索引，可以使用dropIndex函数，用法和createIndex一致，不过不能删除_id上的索引。

createIndex函数也可以接受选项：

参数	类型	描述
background	Boolean	建索引过程会阻塞其它数据库操作，background可指定以后台方式创建索引，即增加 "background" 可选参数。 "background" 默认值为false。
unique	Boolean	建立的索引是否唯一。指定为true创建唯一索引。默认值为false.
name	string	索引的名称。如果未指定，MongoDB的通过连接索引的字段名和排序顺序生成一个索引名称。
dropDups	Boolean	在建立唯一索引时是否删除重复记录,指定 true 创建唯一索引。默认值为 false.
sparse	Boolean	对文档中不存在的字段数据不启用索引；这个参数需要特别注意，如果设置为true的话，在索引字段中不会查询出不包含对应字段的文档.。默认值为 false.
expireAfterSeconds	integer	指定一个以秒为单位的数值，完成 TTL设定，设定集合的生存时间。
v	index version	索引的版本号。默认的索引版本取决于mongod创建索引时运行的版本。
weights	document	索引权重值，数值在 1 到 99,999 之间，表示该索引相对于其他索引字段的得分权重。
default_language	string	对于文本索引，该参数决定了停用词及词干和词器的规则的列表。默认为英语
language_override	string	对于文本索引，该参数指定了包含在文档中的字段名，语言覆盖默认的language，默认值为 language.

重新explain刚刚的查询，"executionStages"一节如下：

                "executionStages" : {
                        "stage" : "FETCH",
                        "nReturned" : 4,
                        "executionTimeMillisEstimate" : 0,
                        "works" : 5,
                        "advanced" : 4,
                        "needTime" : 0,
                        "needYield" : 0,
                        "saveState" : 0,
                        "restoreState" : 0,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 4,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 4,
                                "executionTimeMillisEstimate" : 0,
                                "works" : 5,
                                "advanced" : 4,
                                "needTime" : 0,
                                "needYield" : 0,
                                "saveState" : 0,
                                "restoreState" : 0,
                                "isEOF" : 1,
                                "invalidates" : 0,
                                "keyPattern" : {
                                        "num" : 1
                                },
                                "indexName" : "num_1",
                                ... //剩下的暂时不重要
                        }
                }

可以看到，自动使用了新建立的索引，仅仅检查了4个文档（docsExamined），且用时为0ms，性能不知道高到哪里去

如果此时想强制使用_id索引，可以使用hint()函数：

> db.numbers.find({num:{"$gt":19995}}).hint({_id:1}).explain("executionStats")
                        ... //省略
                        "needTime" : 19996,
                        "needYield" : 0,
                        "saveState" : 156,
                        "restoreState" : 156,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 20000,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "nReturned" : 20000,
                                "executionTimeMillisEstimate" : 10,
                                "works" : 20001,
                                "advanced" : 20000,
                                "needTime" : 0,
                                "needYield" : 0,
                                "saveState" : 156,
                                "restoreState" : 156,
                                "isEOF" : 1,
                                "invalidates" : 0,
                                "keyPattern" : {
                                        "_id" : 1
                                },
                                "indexName" : "_id_",
                        ... //省略

可以看到，此时索引变成了_id，且检查了20000个对象才完成查询

在数据发生变化（例如插入新数据）之后，索引会重新建立，也可以使用reIndex()函数显式地重建索引

MongoDB的索引是建立在内存中的，在我本机测试时，上述索引建立后，mongod进程使用的内存瞬间增加了12KB

2）覆盖索引查询

假如在该集合的查询中，显式将_id字段排除，那么此时就发生了覆盖索引查询：

所有的查询字段是索引的一部分
所有的查询返回字段在同一个索引中

覆盖查询直接从索引中查询数据，由于索引存放在内存中，查询起来会更快

3）索引失效

和其他数据库类似，某些情况下索引无法命中：

查询条件包含正则表达式及非操作符，如 $nin, $not, 等。
查询条件包含算术运算符，如 $mod, 等。
查询条件包含$where 子句

> db.numbers.find({num:{$mod:[100,0]}},{num:1}).explain("executionStats")
... //省略
                                "docsExamined" : 200,
                                "alreadyHasObj" : 0,
                                "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "filter" : {
                                                "num" : {
                                                        "$mod" : [
                                                                100,
                                                                0
                                                        ]
                                                }
                                        },
                                        "nReturned" : 200,
                                        "executionTimeMillisEstimate" : 10,
                                        "works" : 20001,
                                        "advanced" : 200,
                                        "needTime" : 19800,
                                        "needYield" : 0,
                                        "saveState" : 156,
                                        "restoreState" : 156,
                                        "isEOF" : 1,
                                        "invalidates" : 0,
                                        "keyPattern" : {
                                                "num" : 1
                                        },
                                        "indexName" : "num_1"
... //省略

可以看到，使用$mod操作符时，尽管使用了索引，但是无法命中，仍旧消耗了较长时间

4）索引限制

mongoDB的索引存放在内存里，因此有一定限制：

索引键限制：索引项的总大小必须小于1024字节（可能包含了数据结构元数据信息的开销）
单个集合的索引不能超过64个
包含命名空间和点分隔符（即<database name>.<collection name>.$<index name>）的完全限定索引名不能长于128个字符。
复合索引中的字段不能超过32个
索引不能覆盖对数组字段的查询
索引不能超过剩余内存空间大小，否则MongoDB会尝试清除已有的索引以释放空间，这会导致查询变慢

其他限制请参阅官方文档

2.全文索引

全文检索的原理是，对每一个词建立一个索引，指明该词在文章中出现的次数和位置，当用户查询时，检索程序就根据事先建立的索引进行查找，并返回查找的结果。这个过程类似于通过字典中的检索字表查字的过程。

建立的方法和普通索引一样，只是把1换成"text"，用以前建立的user集合来测试：

> db.user.createIndex({"hobby":"text"})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

如果使用的是2.6以前的版本，需要显式开启全文索引支持：

db.adminCommand({setParameter:true,textSearchEnabled:true})

或者使用命令：mongod --setParameter textSearchEnabled=true

然后就可以进行查询了：

> db.user.find({$text:{$search:"book"}})
{ "_id" : ObjectId("5c3ef0037da85af675c7c109"), "name" : "wangwu", "sex" : "man", "age" : 18, "hobby" : "read book" }

要使用$text和$search两个操作符

MongoDB学习（三）：索引