PyMongo初级使用教程

教程

这篇教程主要介绍了MongoDB和PyMongo的初级使用.

准备工作

在我们开始前，请确保你已经安装了PyMongo的发行版本。在Python命令行模式下，执行下面的指令应当不会报出异常：

>>> import pymongo

这篇教程中运行的MongoDB实例运行在默认的主机和端口上。如果你已经下载并安装了MongoDB，你可以像这样开启它：

$ mongod

使用MongoClient建立Connection

使用PyMongo的第一步是创建一个MongoClient来运行mongod实例。就像这样：

>>> from pymongo import MongoClient
>>> client = MongoClient()

上面的代码将会连接到默认的主机和端口。当然，我们也可以指定主机和端口，就像下面这样：

>>> client = MongoClient('localhost', 27017)

或者使用MongoDB URI格式：

>>> client = MongoClient('mongodb://localhost:27017/')

取得数据库（Database）

单个MongoDB实例可以支持多个独立的数据库。当使用PyMongo取得数据库时，你可以使用MongoClient实例的属性形式：

>>> db = client.test_database

如果你的数据库名称不怎么适合用属性形式来获取（例如test-database），你也可以使用字典的形式来替代：

>>> db = client['test-database']

取得Collection

一个Collection是一组存储在MongoDB的文档（Document），可以理解为关系数据库中的表（Table）。获取一个Collection和获取一个数据库的方式一样：

>>> collection = db.test_collection

或者 (使用字典形式获取)：

扫描二维码关注公众号，回复： 2554398 查看本文章

>>> collection = db['test-collection']

有重要的一点需要注意，MongoDB中的Collection（和数据库）是惰性创建的——上述的指令并没有在MongoDB执行任何操作。Collection和数据库只有当第一个Document插入时才真正创建。

Documents

MongoDB中的数据是使用JSON风格的文档体现（和存储）的。在PyMongo中，我们使用字典来体现文档（Document）。举个例子，下面的字典可以用来表示一个博客信息提交：

>>> import datetime
>>> post = {"author": "Mike",
...         "text": "My first blog post!",
...         "tags": ["mongodb", "python", "pymongo"],
...         "date": datetime.datetime.utcnow()}

请注意，文档（Document）可以包含本地Python类型（像datetime.datetime实例）。这些类型会被转换成合适的BSON类型。

插入Document

向Collection中插入一个Document可以使用insert_one()方法：

>>> posts = db.posts
>>> post_id = posts.insert_one(post).inserted_id
>>> post_id

ObjectId('...')

当一个Document被插入了一个特殊的Key（键），“_id”会被自动添加（如果Document中没有已经存在的“_id”）。“_id”的值必须在整个Collection中是唯一的。insert_one()方法会返回一个InsertOneResult的实例。想要了解更多“_id”的信息，请参考对应的文档。

在插入第一个Document之后，这个posts Collection才真正在服务器上被创建。我们可以列举数据库中所有的Collection来进行确认：

>>> db.collection_names(include_system_collections=False)
[u'posts']

通过find_one()方法获取一个Document

在MongoDB中，最基本的查询可以通过使用 find_one()方法。这个方法会返回一个满足查询条件的Document（如果没有查询到，则返回None）。如果你已经知道查询结果只有一个或者你只对第一个查询结果感兴趣，这个方法是很有用的。用法如下：

>>> import pprint
>>> pprint.pprint(posts.find_one())
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

结果是一个字典，之前我们插入的那条。

注意返回的Document中包含了“_id”字段，它是在插入时自动添加的。
find_one()也支持根据制定的元素来查找符合的Document。要限制我们的结果包含作者“Mike”，我们可以这样做：

>>> pprint.pprint(posts.find_one({"author": "Mike"}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

如果我们尝试用不同的作者，像“Eliot”，我们不会获得结果：

>>> posts.find_one({"author": "Eliot"})
>>>

通过ObjectId来查询

我们也能够通过_id来查找post，在我们的例子中使用了ObjectId：

>>> post_id
ObjectId(...)
>>> pprint.pprint(posts.find_one({"_id": post_id}))
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}

请注意一个ObjectId和ObjectId的字符串表达是不同的：

>>> post_id_as_str = str(post_id)
>>> posts.find_one({"_id": post_id_as_str}) # No result
>>>

在Web应用中一个常见的任务是从请求URL获得的ObjectId来查找对应的Document。在这种情况下我们才需要将字符串转换为ObjectId，然后再作为参数传递给find_one()：

from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

关于Unicode编码的字符串注意点

你可能已经注意到了我们之前存储的通常的Python字符串和我们从服务器上获取到的不一样（u’Mike’而不是’Mike’）。做个简短的解释。

MongoDB使用BSON格式存储数据。BSON字符串是UFT-8编码的，所以PyMongo必须确保它保存的任何字符串只包含正确的UTF-8数据。通常的字符串（以单引号包裹的）被验证之后便不加改动得存储起来。而Unicode字符串会被先编码成UTF-8格式。在我们的例子中，Python命令行中的字符串，像u’Mike’替代了’Mike’这样的原因是，PyMongo将每个BSON字符串都解码成了Unicode，而不是常规字符串。

批量插入

为了让查询变得更有趣一点，让我们再插入一些Document。除了插入单个Document，通过传递一个list（列表）到insert_many()的第一个参数，我们也可以执行批量插入操作。这将会插入list中的每个Document，只需要发送一条指令到服务器：

>>> new_posts = [{"author": "Mike",
...               "text": "Another post!",
...               "tags": ["bulk", "insert"],
...               "date": datetime.datetime(2009, 11, 12, 11, 14)},
...              {"author": "Eliot",
...               "title": "MongoDB is fun",
...               "text": "and pretty easy too!",
...               "date": datetime.datetime(2009, 11, 10, 10, 45)}]
>>> result = posts.insert_many(new_posts)
>>> result.inserted_ids
[ObjectId('...'), ObjectId('...')]

这个例子中有一些有趣的地方需要注意：

insert_many()现在返回两个ObjectId实例，每个对应一条插入的Document。
new_posts[1]与其他posts相比有一个不同的“形状”——没有“tags”字段，但我们添加了一个新的字段“title”。所以我们说MongoDB是模式自由的。

查询多个Document

想要获取查询的多个结果，我们可以使用find()方法。find()方法返回一个Cursor实例，用来遍历结果中的Document。例如，我们可以遍历posts collection中的每个Document：

>>> for post in posts.find():
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}

就像我们使用find_one()方法一样，我们可以传递一个Document给find()方法来限制返回的揭开锅。在这儿，我们只获取那些作者为“Mike”的Documents。

>>> for post in posts.find({"author": "Mike"}):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'mongodb', u'python', u'pymongo'],
 u'text': u'My first blog post!'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

计数

如果我们只是想知道有多少个Document满足一次查询，我们可以调用count()方法，而不是使用完整查询。我们可以获得一个Collection中所有Document的数量：

>>> posts.count()
3

或者那些满足特定查询结果的Document数量：

>>> posts.find({"author": "Mike"}).count()
2

范围查询

MongoDB支持多种不同类型的高级查询。例如，我们可以查询在指定日期之后的posts（但以author作为排序字段）：

>>> d = datetime.datetime(2009, 11, 12, 12)
>>> for post in posts.find({"date": {"$lt": d}}).sort("author"):
...   pprint.pprint(post)
...
{u'_id': ObjectId('...'),
 u'author': u'Eliot',
 u'date': datetime.datetime(...),
 u'text': u'and pretty easy too!',
 u'title': u'MongoDB is fun'}
{u'_id': ObjectId('...'),
 u'author': u'Mike',
 u'date': datetime.datetime(...),
 u'tags': [u'bulk', u'insert'],
 u'text': u'Another post!'}

我们使用这个特殊的“$it”操作符来执行范围查询，同时调用sort()来对结果进行排序（以author为排序字段）

索引

添加索引可以加速特定的查询，同时也能用来查询和排序。在本例中，我们将演示如何在一个键上创建唯一的索引，该索引排除了索引中已存在该键的值的文档。

首先，我们需要先创建索引：

>>> result = db.profiles.create_index([('user_id', pymongo.ASCENDING)], unique=True)
>>> sorted(list(db.profiles.index_information()))
[u'_id_', u'user_id_1']

请注意，我们现在有两个索引：一个是针对_id的索引（这是MongoDB自动创建的），另一个就是我们刚刚对user_id创建的索引。

现在让我们添加一些用户数据：

>>> user_profiles = [
...     {'user_id': 211, 'name': 'Luke'},
...     {'user_id': 212, 'name': 'Ziltoid'}]
>>> result = db.profiles.insert_many(user_profiles)

索引阻止我们插入那些user_id已经存在于Collection中的Document。

>>> new_profile = {'user_id': 213, 'name': 'Drew'}
>>> duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
>>> result = db.profiles.insert_one(new_profile)  # This is fine.
>>> result = db.profiles.insert_one(duplicate_profile)
Traceback (most recent call last):
DuplicateKeyError: E11000 duplicate key error index: test_database.profiles.$user_id_1 dup key: { : 212 }

翻译完成。2017.6.3