Solr: Solr Integrate with MongoDB

Install Mongo-Connector

pip install mongo-connector

pip uninstall mongo-connector

git clone https://github.com/10gen-labs/mongo-connector.git
cd mongo-connector
python setup.py install

modify doc_managers/solr_doc_manager.py

from mongo_connector import errors
#from mongo_connector.constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from constants import (DEFAULT_COMMIT_INTERVAL,DEFAULT_MAX_BULK)
from mongo_connector.util import retry_until_ok
#from mongo_connector.doc_managers import DocManagerBase, exception_wrapper
from doc_managers import DocManagerBase, exception_wrapper
#from mongo_connector.doc_managers.formatters import DocumentFlattener
from doc_managers.formatters import DocumentFlattener

 

Test

#mongo-connector


MongoDB Replica set

1. make the following dirs arch

rs
├── db
│   ├── rs1
│   │   ├── journal
│   │   └── _tmp
│   └── rs2
│       └── journal
└── log

2.  run two instances

#cd rs

#mongod --port 27001 --oplogSize 100 --dbpath db/rs1 --logpath log/rs1.log --replSet rs/127.0.0.1:27002 --journal 

#mongod --port 27002 --oplogSize 100 --dbpath db/rs2 --logpath log/rs2.log --replSet rs/127.0.0.1:27001 --journal

3. config replica set

#mongo --port 27001

>config={_id:'rs', members:[{_id:0, host:'localhost:27001'},{_id:1, host:'localhost:27002'}]}

>rs.initiate(config)

>rs.status()

rs:PRIMARY>

Dump data from mongo to solr

#python connector.py --unique-key=id --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d solr_doc_manager.py

or

#mongo-connector   --auto-commit-interval=0 -n test.test  -m localhost:27001 -t http://localhost:8983/solr/inokqreply -d doc_managers/solr_doc_manager.py

------------------------------------------------------------------------------------------------------------------------------------

error:



 

modify python2.7/site-packages pysolr.py

    716             for bit in values:
    717                 if self._is_null_value(bit):
    718                     continue
    719 
    720                 #attrs = {'name': key}
    721                 attrs = {str('name'): key}
    722 
    723                 if boost and key in boost:
    724                     #attrs['boost'] = force_unicode(boost[key])
    725                     attrs[str('boost')] = force_unicode(boost[key])
    726 
    727                 field = ET.Element('field', **attrs)
    728                 field.text = self._from_python(bit)
    729 
    730                 doc_elem.append(field)

see related error :https://github.com/toastdriven/pysolr/issues/72

 error:

solution: delete the config.txt file under the dir which lanched the above command.

error:

  File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 418, in _scrape_response
    import lxml.html
ImportError: No module named lxml.html
2014-07-31 14:29:51,872 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient([u'localhost:27017', u'localhost:27018']), u'local'), u'oplog.rs')

Solution:

yum install python-lxml

yum install libxml2-python

yum install libxml2-dev or libxslt-devel

pip install lxml   or  pip install lxml==3.2.4

pip install cssselect

#ln -s /usr/local/python27/lib/libpython2.7.so /usr/local/lib/
 

 error

File "build/bdist.linux-x86_64/egg/pysolr.py", line 318, in _send_request
    error_message = self._extract_error(resp)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 397, in _extract_error
    reason, full_html = self._scrape_response(resp.headers, resp.content)
  File "build/bdist.linux-x86_64/egg/pysolr.py", line 429, in _scrape_response
    p_nodes = body_node.cssselect('p')
AttributeError: 'NoneType' object has no attribute 'cssselect'
2014-07-31 17:29:25,320 - ERROR - OplogThread: Failed during dump collection cannot recover! Collection(Database(MongoClient([u'localhost:27017', u'localhost:27018']), u'local'), u'oplog.rs')

Solution

https://github.com/toastdriven/pysolr/pull/92

 https://github.com/toastdriven/pysolr/pull/92

 https://github.com/toastdriven/pysolr/pull/92

 ==========================================================

pysolr

https://pypi.python.org/pypi/pysolr/3.2.0

===============================================================================

datetime

my situation is that there is a 'created_at' filed which store unix timestamp with long format

when import these data to solr by mongo-connect, there is a error "Invalide date string "

Solution:

1. uninstall mongo-connector

#pip uninstall mongo-connector

2. modify mongo_connector/doc_managers/formatters.py

    143     def transform_element(self, key, value):
    144         if isinstance(value, list):
    145             for li, lv in enumerate(value):
    146                 for inner_k, inner_v in self.transform_element(
    147                         "%s.%s" % (key, li), lv):
    148                     yield inner_k, inner_v
    149         elif isinstance(value, dict):
    150             formatted = self.format_document(value)
    151             for doc_key in formatted:
    152                 yield "%s.%s" % (key, doc_key), formatted[doc_key]
    153         else:
    154             # We assume that transform_value will return a 'flat' value,
    155             # not a list or dict
    156             # print("+++++++++++++++++++++ key=%s  value=%s" %(key,value))
    157             if key == "created_at":
    158                 yield key, self.transform_dateformat(value)
    159             else:
    160                 yield key, self.transform_value(value)
    105     def transform_dateformat(self, value):
    106         return datetime.datetime.fromtimestamp(int(value), None)

3. reinstall

#python setup.py install

Everything is OK.

http://tool.chinaz.com/Tools/unixtime.aspx

http://developwithstyle.com/articles/2010/07/09/handling-dates-in-mongodb/

https://wiki.python.org/moin/TimeTransitionsImage

===================================================================================

There is a post who support another mongodb-solr-DIH tool

http://stackoverflow.com/questions/9345335/solr-data-import-handlers-for-mongodb

#git clone https://github.com/james75/SolrMongoImporter

 -------------------------

init script

https://gist.github.com/lovett89/9260081

http://www.snip2code.com/Snippet/33459/mongo-connector-init-script-%28tested-on-C

https://github.com/10gen-labs/mongo-connector/issues/96

  1. Modify the variables at the top of mongo-connector.start to your liking
  2. Modify the wrapper variable at the top of the init script to point to the location of mongo-connector.start
  3. Place the mongo-connector script in /etc/init.d and run chkconfig --add mongo-connector

When I run chkconfig --add mongo-connector, there is no chkconfig command.

Solution:

sudo apt-get install sysv-rc-conf

=================================================================================

mongodb commands

http://blog.csdn.net/wangpeng047/article/details/7705588

References

http://www.cnblogs.com/sysuys/p/3403670.html

http://blog.mongodb.org/post/29127828146/introducing-mongo-connector

https://github.com/10gen-labs/mongo-connector/wiki/Usage-with-Solr

猜你喜欢

转载自ylzhj02.iteye.com/blog/2090355
今日推荐