Let’s talk about some pitfalls of Mongodb insertMany

Overview

Mongodb provides multiple methods to insert data into collections

  • Insert a piece of data
db.collection.insertOne()
  • Insert multiple documents
db.collection.insertMany()
  • Update document data that does not exist in the collection and insert data when {upsert: true} is specified.
db.collection.updateOne()
db.collection.updateMany()
db.collection.findAndModify()
db.collection.findAndUpdate()
db.collection.findAndReplace()
db.collection.bulkWrite()

_id field

The mongodb insert method has consistent behavior for the _id field. When the client inserts data without specifying the _id field, the database automatically adds an _id field of type ObjectId as the primary key. Mongodb adds a unique key index to the _id field of the collection, so when the user inserts a document with the _id field, mongodb will verify the uniqueness of the _id field. After the data is successfully inserted, mongodb returns the ID of the inserted document.

//向集合product插入一条没有id字段的文档
db.products.insertOne( { item: "card", qty: 15})

//返回插入成功并返回插入数据的id
{
	"acknowledged" : true,
	"insertedId" : ObjectId("65966778d63bea6fd2f4b7a7")
}

//向集合product插入指定id字段的文档
db.products.insertMany([
  {_id:1, item: "card", qty: 15},
  {_id:2, item: "pen", qty: 15},
] )

//返回插入成功和插入数据的id
{
	"acknowledged" : true,
	"insertedIds" : [ 1, 2 ]
}

//向集合products插入已存_id的数据
db.products.insertMany([
  {_id:2, item: "bag", qty: 15}
] )

//报错
"writeErrors" : [
		{
			"index" : 0,
			"code" : 11000,
			"errmsg" : "E11000 duplicate key error collection: test.products index: _id_ dup key: { _id: 2 }",
			"op" : {
				"_id" : 2,
				"item" : "bag",
				"qty" : 15
			}
		}
	],

atomicity

Mongodb's operations on single documents are atomic. Including single document insert, update and delete operations. The insertMany(), updateMany(), and bulkWrite() methods in the insertion method are not atomic operations when operating multiple documents. This will cause a problem, how many pieces of data are inserted when the insertMany() method is executed.

//这条语句,插入多少条数据?
db.products.insertMany([
  {_id:3, item: "bag", qty: 15},
  {_id:4, item: "ruler", qty: 10},
  {_id:4, item: "cup", qty: 12},
  {_id:5, item: "key", qty: 14}
] )

Review the syntax of insertMany

//insertMany的语法
db.collection.insertMany(
	[<document 1>, <document 2>, ...],
  {
    writeConcern: <document>,
    ordered:<boolean>
  }
)

Parameter definition

parameter name

type

describe

document

Document type

Array of documents planned to be inserted

writeConcern

Document type

Optional parameter, specify the data submission method, the default data submission method is used by default

ordered

Boolean type

Whether to insert data in the order of documents in the array, default true

The writeConcern parameter will be mentioned later, and the ordered parameter will have different effects on the results of the insertMany() method that reports errors. When ordered is specified as true or the default value is used. Mongodb inserts data into the collection one by one according to the order of documents in the array. When an error occurs during the insertion process, the insertion operation stops and subsequent data will not be inserted. When ordered is specified as false and an error occurs during data insertion, mongodb will continue to insert subsequent data.

db.products.insertMany([
  {_id:3, item: "bag", qty: 15},
  {_id:4, item: "ruler", qty: 10},
  {_id:4, item: "cup", qty: 12},
  {_id:5, item: "key", qty: 14}
] )

BulkWriteError({
	"writeErrors" : [
		{
			"index" : 2,
			"code" : 11000,
			"errmsg" : "E11000 duplicate key error collection: test.products index: _id_ dup key: { _id: 4 }",
			"op" : {
				"_id" : 4,
				"item" : "cup",
				"qty" : 12
			}
		}
	],
	"writeConcernErrors" : [ ],
	"nInserted" : 2,
	"nUpserted" : 0,
	"nMatched" : 0,
	"nModified" : 0,
	"nRemoved" : 0,
	"upserted" : [ ]
})

The returned result shows that 2 pieces of data have been inserted. Among them, the data of item: "cup" failed to be inserted due to a primary key conflict, and the data of _id: 5 was not inserted.

Specify {ordered: false} to re-execute the insertion

db.products.insertMany([
  {_id:3, item: "bag", qty: 15},
  {_id:4, item: "ruler", qty: 10},
  {_id:4, item: "cup", qty: 12},
  {_id:5, item: "key", qty: 14}
], {
    ordered: false
} )

BulkWriteError({
	"writeErrors" : [
		{
			"index" : 2,
			"code" : 11000,
			"errmsg" : "E11000 duplicate key error collection: test.products index: _id_ dup key: { _id: 4 }",
			"op" : {
				"_id" : 4,
				"item" : "cup",
				"qty" : 12
			}
		}
	],
	"writeConcernErrors" : [ ],
	"nInserted" : 3,
	"nUpserted" : 0,
	"nMatched" : 0,
	"nModified" : 0,
	"nRemoved" : 0,
	"upserted" : [ ]
})

3 pieces of data were inserted successfully, but only item: "cup" failed to be inserted due to a primary key conflict.

There is no transaction using mongodb here. Mongodb transactions are atomic. When an error is reported when inserting data in a mongodb transaction, the data will not be inserted.

writeConcern

Now let’s discuss writeConcern. Mongodb defines the method of submitting and returning data in the replication set through writeConcern. When writeConcern is specified as majority, a larger number of replica set nodes are required to complete the data submission and notify the primary node before the primary node returns the insertion result to the client. If the time from the slave node to notify the master node exceeds the time set by wtimeout, a replication time out error will be reported when data is inserted.

//本案例可能无法再本地重现,适当减少wtimeout时间尝试
db.products.insertMany(
      [
         { _id: 10, item: "large box", qty: 20 },
         { _id: 11, item: "small box", qty: 55 },
         { _id: 12, item: "medium box", qty: 30 }
      ],
      { w: "majority", wtimeout: 100 }
   );

WriteConcernError({
   "code" : 64,
   "errmsg" : "waiting for replication timed out",
   "errInfo" : {
     "wtimeout" : true,
     "writeConcern" : {    // Added in MongoDB 4.4
       "w" : "majority",
       "wtimeout" : 100,
       "provenance" : "getLastErrorDefaults"
     }
   }
})

Insert quantity

The number of documents inserted in each insertion operation cannot exceed the limit of maxWriteBatchSize. The default value of maxWriteBatchSize is 100000. Setting such a limit avoids excessive database insertion errors. When some database connection drivers insert data, the inserted data will be inserted in batches according to maxWriteBatchSize. For example, if 200,000 pieces of data are inserted, the database driver may divide it into two insertion operations, each inserting 100,000 pieces.

Implementation plan

The insertOne() and insertMany() methods do not support using the db.collection.explain() method to obtain the execution plan.

performance

When inserting a large number of random number fields (such as hash values) into the database and having indexes on these fields, the insertion performance may become worse. When inserting random numbers in batches, building the update index will consume a lot of CPU and memory. Therefore, when inserting such data, it is recommended to delete the index on the collection in advance and rebuild the index after the insertion is completed. Or insert data into a collection without an index.

Guess you like

Origin blog.csdn.net/wilsonzane/article/details/135391440