Detailed explanation of MongoDB index-03

MongoDB index

An index is a data structure used to quickly query data. B+Tree is a commonly used database index data structure.
MongoDB uses B+Tree as an index , and the index is created on colletions. MongoDB does not use index query
Inquiry, scan all documents first, and then match qualified documents. Queries that use an index to find documents through an index,
Using indexes can greatly improve query efficiency.

Classification of Index

According to the number of fields contained in the index, it can be divided into single-key index and composite index (or composite index).
According to the type of index field, it can be divided into primary key index and non-primary key index.
According to the correspondence between index nodes and physical records, it can be divided into clustered index and non-clustered index.
The clustered index means that the index node directly contains data records, while the latter only contains a pointer to the data record
record pointer.
According to the different characteristics of the index, it can be divided into unique index, sparse index, text index, geospatial index, etc.
Like most databases, MongoDB supports a variety of rich index types, including some commonly used structures such as single-key indexes, composite indexes, and unique indexes. Due to the use of flexible document types, it also supports indexing nested fields and arrays. By building a suitable index, we can greatly improve the speed of data retrieval. In some special application scenarios, MongoDB also supports different features such as geospatial index, text retrieval index, and TTL index.

Index Design Principles

1. In principle, each query needs to create a corresponding index
2. The design of a single index should consider satisfying as many queries as possible
3. The selection and order of index fields need to consider query coverage and selectivity
4. Be careful when creating indexes on frequently updated fields
5. For array indexes, the number of future elements needs to be carefully considered
6. Use indexes with caution on ultra-long string type fields
7. It is not advisable to create too many indexes on a single collection with high concurrent updates

index operation

create index

Create Index Syntax Format
db.collection.createIndex(keys, options)
The Key value is the index field you want to create, 1 creates an index in ascending order, -1 creates an index in descending order
The list of optional parameters is as follows:

Parameter

Type

Description

2

Boolean

The index building process will block other database operations, and background can specify to  create an index in the background, that is, add the optional      parameter .   "background" defaults to false.

unique

Boolean

Whether the created Specify true to create a unique index. The default value is false .

name

string

The name of the index . If not specified, MongoDB  generates an index name by concatenating the index's field names and sort order.

dropDups

Boolean

Version 3.0+ is deprecated . Whether to delete duplicate records when creating a unique index,  specify true  to create a unique index. The default value is false.

sparse

Boolean

Indexing is not enabled for field data that does not exist in the document; this parameter needs special attention, if it is set to true, documents that do not contain the corresponding field will not be queried in the index field .  The default

expireAfterSeconds

integer

Specify a value in seconds to complete the TTL setting and set  the lifetime of the collection.

v

index version

The version number of the index The default index version depends on the version mongod was running when creating the index.

weights

document

Index weight value, the value is between 1 and 99,999, indicating the score weight of this index  relative to other index fields.

default_language

string

For text indexes , this parameter determines the list of  stopwords and rules for stemming and tokenizing  . English by default

language_override

string

For text indexes, this parameter specifies the field names contained in the document, the language

# 创建索引后台执行
 db.values.createIndex({open: 1, close: 1}, {background: true})
 # 创建唯一索引
 db.values.createIndex({title:1},{unique:true})

view index

#查看索引信息
 db.books.getIndexes()
#查看索引键
 db.books.getIndexKeys()

View index space usage

db.collection.totalIndexSize([is_detail])
is_detail: optional parameter, if any data other than 0 or false is passed in, the size and total size of each index in the collection will be displayed. If 0 or false is passed in, only the total size of all indexes in the collection will be displayed. The default value is false.

delete index

 

#删除集合指定索引
 db.col.dropIndex("索引名称")
 #删除集合所有索引 不能删除主键索引
 db.col.dropIndexes()

index type

single key index

Create an index on a specific field mongoDB creates a unique single-key index on the ID, so it is often used
id to query; this index will be used for exact matching, sorting, and range lookup on the index field

db.books.createIndex({title:1})

 

Create an index on an embedded document field:

composite index 

A composite index is an index composed of multiple fields, and its properties are similar to single-field indexes. But the difference is that the composite cable
The order of the fields in the reference and the ascending and descending order of the fields have a direct impact on the query performance , so it is necessary to consider when designing a compound index
Consider different query scenarios.

db.books.createIndex({type:1,favCount:1})

 

multi-key index

 

Build an index on the property of the array. A query for any value of this array will locate this document, that is, multiple indexes
Entries or key-value references to the same document
create:  
db.inventory.createIndex( { ratings: 1 } )

Notice:  

A multi-key index is easily confused with a composite index, which is a combination of multiple fields, while a multi-key index is only in
A multi-key (multi key) appears on a field . In essence, multi-key indexes can also appear on composite fields
MongoDB does not support multiple array fields in a composite index

Geospatial Index

In the era of mobile Internet, the location-based search (LBS) function is almost a standard configuration of all application systems. MongoDB provides very convenient functions for geospatial retrieval. Geospatial index (2dsphereindex) is a special index specially used to realize location retrieval. Case: How does MongoDB implement "query nearby businesses"? Suppose the merchant's data model is as follows:

 

db.restaurant.insert({
    restaurantId: 0,
    restaurantName: "兰州牛肉面",
    location: {
        type: "Point",
        coordinates: [73.97, 40.77]
    }
})

Create a 2dsphere index

db.restaurant.createIndex({location : "2dsphere"})

Full text index ( Text Indexes)

MongoDB supports full-text search function, and can realize simple word segmentation search by establishing text index

db.reviews.createIndex( { comments: "text" } )
The $text operator can perform a text search on a collection with a text index. $text will use spaces and punctuation marks as delimiters to segment the search string, and perform a logical OR operation on all word segmentation results in the search string.
Full-text indexing can solve the need for fast text search. For example, if there is a collection of blog articles and needs to be quickly searched based on the content of the blog, a text index can be established for the blog content.
db.stores.insert([{
    _id: 1,
    name: "Java Hut",
    description: "Coffee and cakes"
},
{
    _id: 2,
    name: "Burger Buns",
    description: "Gourmet hamburgers"
},
{
    _id: 3,
    name: "Coffee Shop",
    description: "Just coffee"
},
{
    _id: 4,
    name: "Clothes Clothes Clothes",
    description: "Discount clothing"
},
{
    _id: 5,
    name: "Java Shopping",
    description: "Indonesian goods"
}])

Create a full-text index for name and description
db.stores.createIndex({name: "text", description: "text"})

 

 

Use the $text operator to find all stores in the data that contain any word in the list "coffee", "shop", "java"
db.stores.find({$text: {$search: "java coffee shop"}})

There are many limitations in the text index function of MongoDB, and the official does not provide the function of Chinese word segmentation , which makes this function
The application scenarios are very limited.

Hash Index (Hashed Indexes)

Unlike traditional B-Tree indexes, hash indexes use hash functions to create indexes. Exact matches on indexed fields,
But it does not support range query, and does not support multi-key hash ; the entries on the Hash index are evenly distributed, which is very important in fragmented collections
it works

 

 db.users.createIndex({username : 'hashed'})

Wildcard Indexes

The document mode of MongoDB changes dynamically, and wildcard indexes can be built on some unpredictable fields to
This speeds up the query
db.products.insert([{
    "product_name": "Spy Coat",
    "product_attributes": {
        "material": ["Tweed", "Wool", "Leather"],
        "size": {
            "length": 72,
            "units": "inches"
        }
    }
},
{
    "product_name": "Spy Pen",
    "product_attributes": {
        "colors": ["Blue", "Black"],
        "secret_feature": {
            "name": "laser",
            "power": "1000",
            "units": "watts",
        }
    }
},
{
    "product_name": "Spy Book"
}])

db.products.createIndex( { "product_attributes.$**" : 1 } )

Wildcard indexes can support any single-field query product_attributes or its embedded fields

db.products.find( { "product_attributes.size.length" : { $gt : 60 } } )
db.products.find( { "product_attributes.material" : "Leather" } )
db.products.find( { "product_attributes.secret_feature.name" : "laser" })

 

 Considerations:  Wildcard indexes are not compatible with index types or properties

Wildcard indexes are sparse and do not index empty fields. Therefore, wildcard indexes cannot support documents where the query field does not exist

 

#通配符索引不能支持以下查询
db.products.find({
    "product_attributes": {
        $exists: false
    }
}) 
db.products.aggregate([{
    $match: {
        "product_attributes": {
            $exists: false
        }
    }
}])
A wildcard index generates entries for the content of the document or array, not the document/array itself. So wildcard indexes cannot support exact document/array equality matches. Wildcard indexes can support the case where the query field is equal to the empty document {}.
#通配符索引不能支持以下查询:
 db.products.find({ "product_attributes.colors" : [ "Blue", "Black" ] } )

 db.products.aggregate([{
 $match : { "product_attributes.colors" : [ "Blue", "Black" ] }
 }])

index attribute

Unique Indexes

In real-world scenarios, uniqueness is a very common index constraint requirement, and repeated data records will bring many processing problems
Trouble, such as the number of the order, the login name of the user, etc. By establishing a unique index, the reference of documents in the collection can be guaranteed
Certain fields have unique values.
The unique index will replace the missing fields in the document with null values , so multiple documents are not allowed
The case where the file is missing an index field.
For sharded collections, the uniqueness constraint must match the sharding rule. In other words, in order to guarantee the uniqueness of the global
Uniqueness, the shard key must be used as the prefix field of the unique index.

Partial Indexes

Partial indexing only indexes documents that satisfy the specified filter expression. By being in a collection for a document's one
Subsets are indexed, and partial indexes have lower storage requirements and lower performance costs of index creation and maintenance. 3.2 new
version function
Partial indexes provide a superset of sparse index functionality and should be preferred over sparse indexes
db.restaurants.createIndex(
 { cuisine: 1, name: 1 },
 { partialFilterExpression: { rating: { $gt: 5 } } }
 )
The partialFilterExpression option accepts documents specifying filter criteria:
Equality expressions (eg: field: value or using the $eq operator)
$exists: true
$gt, $gte, $lt, $lte
$type
top-level $and
​​​​​​​# 符合条件,使用索引
 db.restaurants.find( { cuisine: "Italian", rating: { $gte: 8 } } )
 # 不符合条件,不能使用索引
 db.restaurants.find( { cuisine: "Italian" } )
The unique constraint combined with the use of partial indexes leads to the invalidation of the unique constraint
Note: If both a partialFilterExpression and a unique constraint are specified , the unique constraint is only applicable to satisfy the filter
Documentation for expression expressions. A partial index with a unique constraint does not prevent inserts if the document does not satisfy the filter condition
Documents that satisfy the unique constraint.

Sparse Indexes

The sparse property of the index ensures that the index will only contain entries for documents with indexed fields, the index will skip without indexed fields
documentation.
Features: Only index documents with existing fields (including documents with null field values )
#不索引不包含xmpp_id字段的文档
db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )
If a sparse index would result in an incomplete result set for query and sort operations, MongoDB will not use the index unless hint() explicitly specifies the index.
An index that is both sparse and unique prevents documents in the collection from having duplicate field values, but allows insertion of documents that do not contain the indexed field.

TTL Indexes

In general application systems, not all data needs to be permanently stored. For example, some system events, user messages, etc.,
These data become less important over time. More importantly, storing these large amounts of historical data requires
It costs a lot, so data that is out of date and no longer used is usually aged in the project.
The usual practice is as follows:
Solution 1: Record a time stamp for each data, start a timer on the application side, and periodically delete expired data according to the time stamp
according to.
Solution 2: The data is divided into tables by date, the data of the same day is archived in the same table, and the timer is also used to delete expired ones
surface.
For data aging, MongoDB provides a more convenient method: TTL (
Time To Live) index. TTLs
The index needs to be declared in a field of date type. The TTL index is a special single-field index . MongoDB can
Use this to automatically remove documents from a collection after a certain amount of time or a specific clock time.
db.log_events.insertOne( {
 "createdAt": new Date(),
 "logEvent": 2,
 "logMessage": "Success!"
 } )
db.log_events.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 20 })

 Finally, it was cleaned up, and no data was queried

use constraints
TTL index can indeed reduce the workload of development, and it will be more efficient and reliable through automatic database cleaning.
But you need to pay attention to the following restrictions when using TTL indexes:
TTL indexes can only support a single field, and it must be a non-_id field. TTL indexes cannot be used with capped collections.
TTL index cannot guarantee timely data aging , MongoDB will pass the TTLMonitor timer in the background
to clean up aged data, the default interval is 1 minute. Of course, if the database load is too high,
TTL behavior is further affected.
The TTL index only uses the remove command for data cleaning, which is not very efficient. because
This TTL Monitor will cause certain pressure on the system CPU and disk during operation. In contrast, the daily
It will be more efficient to operate in the form of installment table.
Log storage:
date table
fixed set
TTL index
Insert: writeConcern:{w:0}

Index Usage Recommendations

1. Build appropriate indexes for each query

This is for collections with a large amount of data, such as more than tens of millions (number of documents). if there is not
Indexing MongoDB needs to read all documents from the disk to the memory, which will cause a relatively large impact on the MongoDB server.
High pressure and affect the execution of other requests.

2. Create proper compound indexes and don't rely on cross indexes

If your query will use multiple fields, MongoDB has two indexing techniques that can be used: cross index and multiple
combined index. The cross index is to create a single-field index for each field , and then use the corresponding to obtain query results. Cross-references currently have a low trigger rate, so if you have a
When querying multiple fields, it is recommended to use a composite index to ensure the normal use of the index.

3. Composite index field order: Matching conditions first, range conditions after ( Equality First, Range After)

When creating a composite index, if the conditions are divided into matching and range, then the matching condition (sport: "marathon") should be in front of the composite index.
The range condition (age: <30) field should be placed after the composite index.

4. Use Covered Index as much as possible

It is recommended to return only the required fields, and at the same time, use the covering index to improve performance.

5. Indexing should run in the background

When creating an index on a collection, the database where the collection is located will not accept other read and write operations. To build an index for a collection with a large amount of data, it is recommended to use the background running option {background: true}

6. Avoid designing too long array index

Array indexes are multi-valued and require more space to store. If the indexed array length is particularly long, or the array
The uncontrolled growth of the index may lead to a sharp expansion of the index space.

explain Detailed execution plan

Usually we need to care about:
Whether the query uses an index
Does the index reduce the number of records scanned
Is there an inefficient memory ordering
MongoDB provides the explain command, which can help us evaluate the execution plan of the specified query model (querymodel), adjust it according to the actual situation, and then improve query efficiency.
The explain() method has the following form :
db.collection.find().explain(<verbose>)

schema name

describe

queryPlanner _

Details of the execution plan, including query plan, collection information, query conditions, best execution plan , query mode, and MongoDB service information, etc.

exectionStats

Information such as the execution status of the best

allPlansExecution

Select and execute the best execution plan, and return the execution

queryPlanner
# 未创建title的索引
db.books.find({title:"book‐1"}).explain("queryPlanner")

field name

describe

plannerVersion

Execution plan version

namespace

collection of queries

in ndexFilterSet

whether to use index

parsedQuery

query condition

winningPlan

Best Execution Plan

stage

Query method

filter

filter condition

direction

query order

rejectedPlans

rejected execution plan

serverInfo

mongodb server information

 executionStats

The return information of executionStats mode contains all the fields of queryPlanner mode, and also includes
Execution of the best execution plan
#创建索引
db.books.createIndex({title:1})
 db.books.find({title:"book‐1"}).explain("executionStats")
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.books.find({title:"book‐1"}).explain("executionStats")
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "restaurant.books",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"title" : {
				"$eq" : "book‐1"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"title" : 1
				},
				"indexName" : "title_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"title" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"title" : [
						"[\"book‐1\", \"book‐1\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 0,
		"executionTimeMillis" : 0,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 0,
		"executionStages" : {
			"stage" : "FETCH",
			"nReturned" : 0,
			"executionTimeMillisEstimate" : 0,
			"works" : 1,
			"advanced" : 0,
			"needTime" : 0,
			"needYield" : 0,
			"saveState" : 0,
			"restoreState" : 0,
			"isEOF" : 1,
			"docsExamined" : 0,
			"alreadyHasObj" : 0,
			"inputStage" : {
				"stage" : "IXSCAN",
				"nReturned" : 0,
				"executionTimeMillisEstimate" : 0,
				"works" : 1,
				"advanced" : 0,
				"needTime" : 0,
				"needYield" : 0,
				"saveState" : 0,
				"restoreState" : 0,
				"isEOF" : 1,
				"keyPattern" : {
					"title" : 1
				},
				"indexName" : "title_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"title" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"title" : [
						"[\"book‐1\", \"book‐1\"]"
					]
				},
				"keysExamined" : 0,
				"seeks" : 1,
				"dupsTested" : 0,
				"dupsDropped" : 0
			}
		}
	},
	"serverInfo" : {
		"host" : "192.168.30.130",
		"port" : 27017,
		"version" : "4.4.9",
		"gitVersion" : "b4048e19814bfebac717cf5a880076aa69aba481"
	},
	"ok" : 1
}

field name

describe

winningPlan.inputStage

Used to describe the child stage  and provide document and  index keys for its parent stage

winningPlan.inputStage.stage

subquery method

winningPlan.inputStage.keyPattern

The scanned index content

winningPlan . inputStage.indexName

index name

winningPlan.inputStage.isMultiKey

Whether it is Multikey.  Will be true if the index is built on the array

executionStats.executionSuccess

Whether the execution is successful

executionStats.nReturned

the number returned

executionStats.executionTimeMillis

The execution time of this statement

executionStats.executionStages.executionTim eMillisEstimate

检索文档获取数据的时间

executionStats.executionStages.inputStage.ex ecutionTimeMillisEstimate

扫描获取数据的时间

executionStats.totalKeysExamined

索引扫描次数

executionStats.totalDocsExamined

文档扫描次数

executionStats.executionStages.isEOF

是否到达 steam 结尾,  1 或者 true 代表已到达结 尾

executionStats.executionStages.works

作单元数 ,一个查询会分解成小的工作单元

executionStats.executionStages.advanced

优先返回的结果数

executionStats.executionStages.docsExamine

文档检查

allPlansExecution返回的信息包含 executionStats 模式的内容,且包含
allPlansExecution:[]块
stage状态

COLLSCAN

全表扫

IXSCAN

索引扫

FETCH

根据索引检索指定文档

SHARD_MERGE

将各个分片返回数据进行合并

SORT

在内存中进行了排序

LIMIT

使用limit制返回数

SKIP

使用skip行跳过

IDHACK

对_id行查询

SHARDING_FILTER

过mongos对分片数据进行查询

COUNTSCAN

count不使用Index进行count时的stage返回

COUNT_SCAN

count使用了Index进行count时的stage返回

SUBPLA

未使用到索引的$or查询的stage返回

TEXT

使用全文索引进行查询时候的stage返回

PROJECTION

限定返回字段时候stage的返回

执行计划的返回结果中尽量不要出现以下stage:
COLLSCAN(全表扫描)
SORT(使用sort但是无index)
不合理的SKIP
SUBPLA(未用到index的$or)
COUNTSCAN(不使用index进行count)

Guess you like

Origin blog.csdn.net/u011134399/article/details/131259754