MongoDB 官方文档学习笔记(二):文档,BSON类型,字段类型比较、排序

文档(Documents)

MongoDB以BSON的形式存储文档。BSON是JSON的二进制表现形式,但包含更多的数据类型。


文档结构

MongoDB的文档由键值对(field-and-value pairs)组成,如:

{
   field1: value1,
   field2: value2,
   field3: value3,
   ...
   fieldN: valueN
}

字段值可以是任何BSON数据类型,可以是数组,MongoDB文档,MongoDB文档数组,例如:


var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

该文档包括:

  • _id字段为ObjectId类型;
  • name字段为包含first和last字段的文档类型;
  • birth和death为日期Date类型;
  • views为数值NumberLong类型;

字段名

字段名必须为字符串

文档的字段名包含以下限制:

  • _id作为保留字段,用作主键,其值必须唯一、不可变,可以是除了数组之外的任何类型;
  • 不可以$字符作为开头;
  • 不可包含点号(.)字符;
  • 不可以是null字符串。

BSON的文档可以包含同名字段,但作为MongoDB的存储类型时不可以存在重复的同名字段。一些由MongoDB内部创建的文档也许会存在同名字段,但没有MongoDB程序会为用户文档添加同名字段。

字段值限制

For indexed collections, the values for the indexed fields have a Maximum Index Key Length limit. SeeMaximum Index Key Length for details.

点号

MongoDB通过点号(.)去访问数组元素或者切入的文档字段。

数组

通过 【数组字段名.索引值】 访问或者指定数组元素值,数组索引值从0开始(zero-based index position)。

"<array>.<index>"

例如存在以下文档:

{
   ...
   contribs: [ "Turing machine", "Turing test", "Turingery" ],
   ...
}
访问contribs数组第三个元素"Turingery"的表达式为: "contribs.2"。 

For examples querying arrays, see:

SEE ALSO

  • $[] all positional operator for update operations,
  • $[/<identifier/>] filtered positional operator for update operations,
  • $ positional operator for update operations,
  • $ projection operator when array index position is unknown
  • Query an Array for dot notation examples with arrays.
嵌入的文档

通过【文档类型字段名.嵌入文档字段名】访问文档中嵌入的文档字段。如:

"<embedded document>.<field>"

例如存在以下文档:

{
   ...
   name: { first: "Alan", last: "Turing" },
   contact: { phone: { type: "cell", number: "111-222-3333" } },
   ...
}
  • 访问name字段的last字段值:name.last ;
  • 访问contact字段 phone文档中number字段:contract.phone.number

For examples querying embedded documents, see:


文档的限制

文档存在以下限制:

文档大小限制

最大的BSON格式文档大小为16MB(16 megabytes)。

最大的文档大小限制是为了保证单文档不会过分占用RAM内存,或者在传输过程中占用较大的带宽。存储超过最大文档大小限制的文档时,可采用MongoDB的GridFS API。(See mongofiles and the documentation for your driver for more information about GridFS.

文档字段顺序

MongoDB保证了字段的写入顺序,除了以下场景:

  • _id字段总是文档第一个字段;
  • 字段重命名将会进行重排;

Changed in version 2.6: Starting in version 2.6, MongoDB actively attempts to preserve the field order in a document. Before version 2.6, MongoDB did not actively preserve the order of the fields in a document.

_id字段

MongoDB中任何一个文档存储都会有值唯一不可变的_id字段作为主键。如果存储时未指定_id字段,MongoDB将自动生成一个ObjectId类型的_id字段。该规则除了适用insert命令,同样适用于upsert:true操作。

_id字段存在以下特性和限制:

  • 可自动添加;
  • _id为文档的第一个字段,若存储时用户未指定_id字段为第一字段,MongoDB会自动移动到第一位置。
  • _id字段可以是BSON格式的任何数据类型,除了数组;

WARNING

To ensure functioning replication, do not store values that are of the BSON regular expression type in the _id field. 为了保证复制应答的可用,不要使用BSON的regular expression类型作为_id字段。

The following are common options for storing values for _id:

  • Use an ObjectId.

  • Use a natural unique identifier, if available. This saves space and avoids an additional index.

  • Generate an auto-incrementing number.

  • Generate a UUID in your application code. For a more efficient storage of the UUID values in the collection and in the _id index, store the UUID as a value of the BSON BinData type.

    Index keys that are of the BinData type are more efficiently stored in the index if:

    • the binary subtype value is in the range of 0-7 or 128-135, and
    • the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32.
  • Use your driver’s BSON UUID facility to generate UUIDs. Be aware that driver implementations may implement UUID serialization and deserialization logic differently, which may not be fully compatible with other drivers. See your driver documentation for information concerning UUID interoperability.

NOTE

Most MongoDB driver clients will include the _id field and generate an ObjectId before sending the insert operation to MongoDB; however, if the client sends a document without an _id field, the mongod will add the _id field and generate the ObjectId.


BSON 格式数据类型

BSON 是二进制序列形式存储文档的数据格式,以保证程序调用MongoDB,BSON同时具备整型和字符串形式的数据类型标识( integer and string identifiers),如下表所示:

Type Number Alias Notes
Double 1 “double”  
String 2 “string”  
Object 3 “object”  
Array 4 “array”  
Binary data 5 “binData”  
Undefined 6 “undefined” Deprecated.
ObjectId 7 “objectId”  
Boolean 8 “bool”  
Date 9 “date”  
Null 10 “null”  
Regular Expression 11 “regex”  
DBPointer 12 “dbPointer” Deprecated.
JavaScript 13 “javascript”  
Symbol 14 “symbol” Deprecated.
JavaScript (with scope) 15 “javascriptWithScope”  
32-bit integer 16 “int”  
Timestamp 17 “timestamp”  
64-bit integer 18 “long”  
Decimal128 19 “decimal” New in version 3.4.
Min key -1 “minKey”  
Max key 127 “maxKey”  
可用$type操作符根据BSON数据类型查询文档,

To determine a field’s type, see Check Types in the mongo Shell.

If you convert BSON to JSON, see the Extended JSON reference.

ObjectId

ObjectIds 很小且近似唯一,可快速生成及索引(ObjectIds are small, likely unique, fast to generate, and ordered)。ObjectId的值包含12字节(12 bytes),头4个字节是创建该字段的时间戳。12字节分别如下:

  • a 4-byte value representing the seconds since the Unix epoch, //秒级UNIX时间戳
  • a 3-byte machine identifier, //机器码
  • a 2-byte process id, and //进程号
  • a 3-byte counter, starting with a random value. //计数器
MongoDB中任何一个文档存储都会有值唯一不可变的_id字段作为主键。如果存储时未指定_id字段,MongoDB将自动生成一个ObjectId类型的_id字段。该规则除了适用insert命令,同样适用于upsert:true操作。

MongoDB 客户端应该添加一个ObjectId类型的_id字段, 为_id字段使用ObjectIds提供以下额外的好处:

  • 可使用ObjectId.getTimestamp()方法获取ObjectId类型_id字段时间戳;
  • 通过索引ObjectId类型的_id排序相当于通过create time创建时间排序;

IMPORTANT

While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:

  • Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and        //ObjectId秒级的时间戳不能保证同一秒并发时的顺序
  • Are generated by clients, which may have differing system clocks. //客户端创建ObjectId时依赖系统时钟

String字符串类型

BSON字符串均为UTF-8编码。不同编程语言在序列化和反序列化BSON字符串之前均需要转为UTF-8格式。这将更为简单的保证更多国际字符的存储和传输[1]。此外,MongoDB $regex查询支持UTF-8格式的正则表达式。

  [1] Given strings using UTF-8 character sets, using sort() on strings will be reasonably correct. However, because internallysort() uses the C++ strcmp api, the sort order may handle some characters incorrectly.

Timestamps

BSON具有特殊的时间戳类型提供给MongoDB内部使用。和常规的Date类型没有关系。包含64bit值:

  • the first 32 bits are a time_t value (seconds since the Unix epoch) //前32位为UNIX时间戳
  • the second 32 bits are an incrementing ordinal for operations within a given second. 

同一个mongod实例,timestamp值总是唯一的。(Within a single mongod instance, timestamp values are always unique.)

NOTE 

BSON timestamp类型是给MongoDB内部使用的,在多少场景,应用程序开发使用BSON的 date 类型即可


如果插入的文档包含空的BSON timestamp 类型顶层字段,MongoDB将会使用当前时间戳值替换该空值,如:

var a = new Timestamp();

db.test.insertOne( { ts: a } );

将返回近似结果(时间和_id值执行时不一定一样):

{
    "_id" : ObjectId("5b4add4bb53c6061fcadcb4f"),
    "ts" : Timestamp(1531632971, 1)
}

若果ts是作为内嵌的文档字段,则MongoDB将维持空值,不会自动填充。

Changed in version 2.6: Previously, the server would only replace empty timestamp values in the first two fields, including _id, of an inserted document. Now MongoDB will replace any top-level field.

Date日期时间类型

BSON Date 类型通过一个64位整形数值代表从Unix epoch (Jan 1, 1970) 的UNIX时间毫秒值(通常意义的时间戳)。该方式可表达过去及将来290 million years的时间范围。

The official BSON specification refers to the BSON Date type as the UTC datetime.

BSON Date type is signed. [2] Negative values represent dates before 1970.

MongoDB shell可通过 new Date() 构造器创建BSON Date类型对象:

var mydate1 = new Date(); //mydate1.toString()打印 Sun Jul 15 2018 13:47:00 GMT+0800

也可以通过 ISODate()构造方法创建:

var mydate2 = ISODate();  //mydate2.toString()打印 Sun Jul 15 2018 13:47:00 GMT+0800

将BSON Date 类型数据输出为字符串格式:

mydate1.toString();  //mydate1.toString()打印 Sun Jul 15 2018 13:47:00 GMT+0800

输出BSON Date类型当前月,月份是从0开始的(zero-indexed),即January 是0:

mydate1.getMonth();  //当前是7月,则输出6

比较和排序(Comparison/Sort Order)

当比较BSON不同类型数据时,MongoDB采用如下比较顺序,从小到大:

  1. MinKey (internal type)
  2. Null
  3. Numbers (ints, longs, doubles, decimals)
  4. Symbol, String
  5. Object
  6. Array
  7. BinData
  8. ObjectId
  9. Boolean
  10. Date
  11. Timestamp
  12. Regular Expression
  13. MaxKey (internal type)


数值类型

MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison. //数值类型比较前会转为同一类型比较。

字符串

二进制比较

MongoDB默认采用简单的二进制比较的方式比较字符串。

校对Collation

New in version 3.4.

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.

Collation specification has the following syntax:

{
   locale: <string>,
   caseLevel: <boolean>,
   caseFirst: <string>,
   strength: <int>,
   numericOrdering: <boolean>,
   alternate: <string>,
   maxVariable: <string>,
   backwards: <boolean>
}

When specifying collation, the locale field is mandatory; all other collation fields are optional. For descriptions of the fields, see Collation Document.

若没有为集合或操作指定校对collation, MongoDB 早期版本默认采用简单的二进制比较的方式比较字符串。

数组

With arrays, a less-than comparison or an ascending sort compares the smallest element of arrays, and a greater-than comparison or a descending sort compares the largest element of the arrays. As such, when comparing a field whose value is a single-element array (e.g. [ 1 ]) with non-array fields (e.g. 2), the comparison is between 1 and 2. A comparison of an empty array (e.g. [ ]) treats the empty array as less than null or a missing field.

当比较仅有一个元素的数组字段(如[1])和非数组字段(如: 2)时,将对值1和2进行比较。比较过程中,空数组(如:[])认为小于null或者是个缺失字段。


Objects对象比较

MongoDB比较BSON对象时遵循以下顺序:

  1. Recursively compare key-value pairs in the order that they appear within the BSON object. //递归比较键值对。
  2. Compare the key field names. //先比较 键名称
  3. If the key field names are equal, compare the field values. // 键相同 则比较值
  4. If the field values are equal, compare the next key/value pair (return to step 1). An object without further pairs is less than an object with further pairs. // 值也相同时,比较下一个键值对,循环步骤1,若其中对象不再有键值对,则认为其较还有键值对的对象更小。


日期和时间戳Dates and Timestamps

v3.0.0的变更: Date对象排在Timestamp对象之前,早期版本一起排序;

不存在的字段

不存在的字段在比较时被当作空的BSON对象。所以,对文档{ }和文档{a: null}的排序认为是相等的。

As such, a sort on the a field in documents { } and { a: null } would treat the documents as equivalent in sort order.

BinData二进制数据

MongoDB对BinData的排序遵循以下规则:

  1. First, the length or size of the data. //数据长度或大小
  2. Then, by the BSON one-byte subtype. 
  3. Finally, by the data, performing a byte-by-byte comparison. //逐字节比较


以上为本人对官网手册的理解,若有疑问可直接参考官方手册:

https://docs.mongodb.com/manual/


猜你喜欢

转载自blog.csdn.net/zhujq_icode/article/details/81051370