实战mongodb存取icbase

本地安装mongodb

1.ubuntu安装
  sudo apt-get install mongodb  会安装很多依赖的包
  安装完后会自动启动服务

  开始:sudo service mongodb start
  停止:sudo service mongodb stop

  xiaofei@xiaofei-desktop:~$ ps aux | grep mongodb
mongodb  12470  0.0  0.1  70348  3624 ?        Ssl  16:20   0:00 /usr/lib/mongodb/mongod --config /etc/mongodb.conf
xiaofei  12496  0.0  0.0   3544   832 pts/3    S+   16:20   0:00 grep --color=auto mongodb

2.看看/etc/mongodb.conf文件

# This is an config file for MongoDB master daemon mongod
# it is passed to mongod as --config parameter

logpath = /var/log/mongodb/mongod.log
dbpath = /var/lib/mongodb/

# use 'true' for options that don't take an argument     
logappend = true
bind_ip = 127.0.0.1
#noauth = true 

3.浏览器中输入http://127.0.0.1:28017/,即可查看数据库一些基本系统信息

连接mongodb并建立一个新数据库

1.安装 pymongo
  easy_install pymongo

  连接mongodb
  >>> from pymongo import Connection
  >>> conn = Connection("127.0.0.1",27017)
  
  如果不存在就创建一个icbase
  >>> db = conn.icbase
  >>> db
  Database(Connection('127.0.0.1', 27017), u'icbase')

  在db上创建一个document
  >>> attrs = db.attrs
  >>> import datetime
  >>> attr = {'author':'Mike','text':'My first blog post!','tags':["mongodb", "python", "pymongo"],'date':datetime.datetime.utcnow()}
  >>> attrs.insert(attr)
  ObjectId('500e65103ec1ee314c000001')

  查询
  >>> attrs.find_one()
  {u'date': datetime.datetime(2012, 7, 24, 9, 3, 55, 440000), u'text': u'My first blog post!', u'_id': ObjectId('500e65103ec1ee314c000001'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}

  >>> attrs.find_one({'author':'mike'})
  >>> attrs.find_one({'author':'Mike'})
  {u'date': datetime.datetime(2012, 7, 24, 9, 3, 55, 440000), u'text': u'My first blog post!', u'_id': ObjectId('500e65103ec1ee314c000001'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}

  删除
  >>> attrs.remove({'author':'Mike'})

写脚本将icbase的信息插入到mongodb

写道

{'ic_id':xxx,'ic_partno':xxx,'ic_mfr':mfr_id,'attrname1_id':attrvalue1,'attrname2_id':attrvalue2,......}

*mfr_id: 是icbase_mfr的id
*attrname1_id: 是icbase_attrname的id
*attrvalue1: 是icbase_icattrvalue的value

{ "_id" : ObjectId("500f6a7b3ec1ee0edd0009f1"), "600" : "100 Ohms", "669" : "PTC", "3862" : "1", "706" : "20 %", "577" : "Thermistors - PTC", "576" : "PTC", "136" : "300 mAmps", "ic_partno" : "PRG18BB101MB1RB", "ic_id" : 2545, "695" : "SMD/SMT", "3576" : "http://www.murata.com/products/catalog/pdf/r90e.pdf", "540" : "Reel", "3856" : "http://cn.mouser.com/Catalog/Simplified_Chinese/638/384.pdf", "3854" : "Thermistors - PTC 100 OHM 24V", "3855" : "81-PRG18BB101MB1RB", "3853" : "PRG18BB101MB1RB", "739" : "24 V", "629" : "PRG", "501" : "- 10 C to + 60 C", "612" : "是", "321" : "Murata", "161" : "0.8 mm W x 1.6 mm L x 0.8 mm H", "ic_mfr" : 119, "538" : "0603" }

所有数据导完预计30-40分钟

273W,总共花掉:0:50:09.411991

尝试在icgoo中使用这些数据

1.读取每个型号的详细参数时使用;
2.详细参数过滤时使用;


在改变之前,首先再次检查了一下在调用产品详细参数的地方,将所以多余的要读库的行为去掉,
只使用attr_name_id,不再循环内通过该id去AttrName找对象.

1.product.models
  class Product:
  有一个icbase方法,
        #ic = IC.objects.get(pk=self.ic_id)
        #if ic:
        return self.ic.attrs
  本来self.ic就是外键对象,可以直接用,注释掉前二句话,不过外键的调用好像也是跟上面是一样,这样改应该效果没有变化

2.在做参数过滤时,对型号进行循环时
                #try:
                    #obj_key = AttrName.objects.get(id=key)
                #except:
                    #continue
    原先是要将key从AttrName中读取对象的,现在不读对象,还是只保存id_key

查看我的mongodb状态

写道
> db.attrs.stats()
{
"ns" : "icbase.attrs",
"count" : 2731252,
"size" : 1415603180,
"storageSize" : 1558881712,
"nindexes" : 1,
"ok" : 1
}
> db.attrs.totalIndexSize()
208570256
> db.attrs.getIndexes()
[
{
"name" : "_id_",
"ns" : "icbase.attrs",
"key" : {
"_id" : ObjectId("000000000000000000000000")
}
}
]


创建ic_id的索引,但是失败

> db.attrs.ensureIndex({ 'ic_id' : 1 })
Thu Jul 26 14:07:41 MessagingPort recv() error "Connection reset by peer" (104) 127.0.0.1:27017
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100
Thu Jul 26 14:07:41 trying reconnect to 127.0.0.1
Thu Jul 26 14:07:41 reconnect 127.0.0.1 ok
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100


看了一下日志:

Thu Jul 26 14:49:03 building new index on { ic_id: 1.0 } for icbase.attrs...
Thu Jul 26 14:49:03 Buildindex icbase.attrs idxNo:1 { ns: "icbase.attrs", key: { ic_id: 1.0 }, name: "ic_id_1" }
1166400/2731251 42%
Thu Jul 26 14:49:16 shutdown: going to flush oplog...
Thu Jul 26 14:49:16 shutdown: going to close sockets...
Thu Jul 26 14:49:16 shutdown: waiting for fs...
Thu Jul 26 14:49:16 shutdown: closing all files...
Thu Jul 26 14:49:16 closeAllFiles() finished
Thu Jul 26 14:49:16 connection accepted from 127.0.0.1:33860 #2
Thu Jul 26 14:49:16 shutdown: removing fs lock...
Thu Jul 26 14:49:16 Listener on port 27017 aborted
Thu Jul 26 14:49:16 dbexit: really exiting now
ERROR: Client::shutdown not called!
Thu Jul 26 14:49:48 Mongo DB : starting : pid = 7291 port = 27017 dbpath = /var/lib/mongodb/ master = 0 slave = 0 32-bit

** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
** see http://blog.mongodb.org/post/137788967/32-bit-limitations for more

应该是一个数据块在32位mongodb下不能超过2G,原来attrs就有1.5G了,在索引创建到42%的时候应该就超过这个限制了,所以不成功.......

默认情况下每个表都会有一个唯一索引:_id,如果插入数据时没有指定_id,服务会自动生成一个_id,为了充分利用已有索引,减少空间开销,最好是自己指定一个unique的key为_id,通常用对象的ID比较合适,比如商品的ID。

_id的索引占用情况
> db.attrs.totalIndexSize()
208570256

> db.attrs.dropIndex('_id_')
{ "nIndexesWas" : 1, "errmsg" : "may not delete _id index", "ok" : 0 }
不能删除

*****
或者可以重新插入数据,但在插入数据的时候手动指定'_id' = ic_id,这样会'_id_'索引就是ic_id的索引


重新在icbase上建一张新表:attrs2,将'_id'指定为ic_id

{ "_id" : 18, "740" : "50 Volts", "612" : "是", "ic_id" : 18, "695" : "SMD/SMT", "55" : "1000 pF", "706" : "10 %", "577" : "多层陶瓷电容 (MLCC) - SMD/SMT", "576" : "General Type MLCCs", "ic_mfr" : 94, "501" : "- 55 C to + 125 C", "540" : "Reel", "321" : "Kemet", "168" : "0.1", "688" : "C0G (NP0)", "ic_partno" : "C1812C102K5GACTU", "161" : "3.2 mm W x 4.5 mm L", "538" : "1812 (4532 metric)" }

> db.attrs2.stats()
{
"ns" : "icbase.attrs2",
"count" : 1468431,
"size" : 703207344,
"storageSize" : 726707424,
"nindexes" : 1,
"ok" : 1
}

MongoDB数据文件内部结构:

http://blog.nosqlfan.com/html/3515.html

猜你喜欢

转载自xiaolin0199.iteye.com/blog/2019168